Publication | Open Access
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
341
Citations
32
References
2017
Year
Convolutional Neural NetworkEngineeringMachine LearningCognitionSocial SciencesImage AnalysisData SciencePattern RecognitionConnectionismCognitive NeuroscienceCognitive ScienceKnowledge TransferFeature LearningNeuroinformaticsComputer ScienceDeep LearningKnowledge DistillModel CompressionOriginal Loss FunctionDeep Neural NetworksKnowledge DistillationComputational NeuroscienceNeuroscienceTransfer Learning
Deep neural networks deliver high performance at the cost of large storage and computation, prompting research into compression and acceleration techniques such as knowledge transfer. This study introduces a novel knowledge transfer method that frames transfer as a distribution matching problem. By matching neuron selectivity distributions between teacher and student networks through a maximum mean discrepancy loss, the method is validated across multiple datasets, combined with other KT techniques, and fine‑tuned for tasks like object detection. The results show that the method substantially boosts student network performance and confirms the transferability of learned features to other tasks.
Despite deep neural networks have demonstrated extraordinary power in various applications, their superior performances are at expense of high storage and computational costs. Consequently, the acceleration and compression of neural networks have attracted much attention recently. Knowledge Transfer (KT), which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the popular solutions. In this paper, we propose a novel knowledge transfer method by treating it as a distribution matching problem. Particularly, we match the distributions of neuron selectivity patterns between teacher and student networks. To achieve this goal, we devise a new KT loss function by minimizing the Maximum Mean Discrepancy (MMD) metric between these distributions. Combined with the original loss function, our method can significantly improve the performance of student networks. We validate the effectiveness of our method across several datasets, and further combine it with other KT methods to explore the best possible results. Last but not least, we fine-tune the model to other tasks such as object detection. The results are also encouraging, which confirm the transferability of the learned features.
| Year | Citations | |
|---|---|---|
Page 1
Page 1