Publication | Closed Access
Semi-Supervised ASR by End-to-End Self-Training
36
Citations
28
References
2020
Year
Unknown Venue
Structured PredictionEngineeringMachine LearningSemi-supervised AsrSpoken Language ProcessingSpeech RecognitionNatural Language ProcessingData SciencePattern RecognitionSelf-supervised LearningRobot LearningReal-time LanguageSemi-supervised LearningSupervised LearningMachine TranslationSemi-supervised Asr SettingData AugmentationComputer ScienceDeep LearningMulti-speaker Speech RecognitionSpeech ProcessingSpeech Input
While deep learning based end-to-end automatic speech recognition (ASR) systems have greatly simplified modeling pipelines, they suffer from the data sparsity issue.In this work, we propose a self-training method with an end-to-end system for semi-supervised ASR.Starting from a Connectionist Temporal Classification (CTC) system trained on the supervised data, we iteratively generate pseudo-labels on a mini-batch of unsupervised utterances with the current model, and use the pseudo-labels to augment the supervised data for immediate model update.Our method retains the simplicity of end-to-end ASR systems, and can be seen as performing alternating optimization over a well-defined learning objective.We also perform empirical investigations of our method, regarding the effect of data augmentation, decoding beamsize for pseudo-label generation, and freshness of pseudo-labels.On a commonly used semi-supervised ASR setting with the Wall Street Journal (WSJ) corpus, our method gives 14.4% relative WER improvement over a carefully-trained base system with data augmentation, reducing the performance gap between the base system and the oracle system by 46%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1