Publication | Closed Access
Knowledge Distillation Using Output Errors for Self-attention End-to-end Models
33
Citations
17
References
2019
Year
Unknown Venue
Artificial IntelligenceEngineeringMachine LearningSelf-attention End-to-end ModelsRecurrent Neural NetworkSpeech RecognitionNatural Language ProcessingSelf-attention Asr ModelsData ScienceKnowledge ProcessingReal-time LanguageMachine TranslationComputer ScienceDeep LearningModel CompressionKnowledge DistillationKnowledge Distillation MethodKnowledge ModelingSpeech ProcessingKnowledge Management
Most automatic speech recognition (ASR) neural network models are not suitable for mobile devices due to their large model sizes. Therefore, it is required to reduce the model size to meet the limited hardware resources. In this study, we investigate sequence-level knowledge distillation techniques of self-attention ASR models for model compression. In order to overcome the performance degradation of compressed models, our proposed method adds an exponential weight to the sequence-level knowledge distillation loss function, which reflects the word error rate of the output of the teacher model based on the ground-truth word sequences. Evaluated on LibriSpeech dataset, the proposed knowledge distillation method achieves significant improvements over the student baseline model.
| Year | Citations | |
|---|---|---|
Page 1
Page 1