Publication | Closed Access
Performance-Aware Mutual Knowledge Distillation for Improving Neural Architecture Search
16
Citations
44
References
2022
Year
Artificial IntelligenceKnowledge RepresentationEvolving Neural NetworkEngineeringMachine LearningData ScienceKnowledge DistillationAutomated ReasoningKnowledge EngineeringComputer ScienceMutual Knowledge DistillationNeural Architecture Search
Knowledge distillation has shown great effectiveness for improving neural architecture search (NAS). Mutual knowledge distillation (MKD), where a group of models mutually generate knowledge to train each other, has achieved promising results in many applications. In existing MKD methods, mutual knowledge distillation is performed between models without scrutiny: a worse-performing model is allowed to generate knowledge to train a better-performing model, which may lead to collective failures. To address this problem, we propose a performance-aware MKD (PAMKD) approach for NAS, where knowledge generated by model <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$A$</tex> is allowed to train model <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$B$</tex> only if the performance of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$A$</tex> is better than B. We propose a three-level optimization framework to formulate PAMKD, where three learning stages are performed end-to-end: 1) each model trains an initial model independently; 2) the initial models are evaluated on a validation set and better-performing models generate knowledge to train worse-performing models; 3) architectures are updated by minimizing a validation loss. Experimental results on a variety of datasets demonstrate that our method is effective.
| Year | Citations | |
|---|---|---|
Page 1
Page 1