Improved Noisy Student Training for Automatic Speech Recognition

Abstract

Recently, a semi-supervised learning method known as "noisy student training"\nhas been shown to improve image classification performance of deep networks\nsignificantly. Noisy student training is an iterative self-training method that\nleverages augmentation to improve network performance. In this work, we adapt\nand improve noisy student training for automatic speech recognition, employing\n(adaptive) SpecAugment as the augmentation method. We find effective methods to\nfilter, balance and augment the data generated in between self-training\niterations. By doing so, we are able to obtain word error rates (WERs)\n4.2%/8.6% on the clean/noisy LibriSpeech test sets by only using the clean 100h\nsubset of LibriSpeech as the supervised set and the rest (860h) as the\nunlabeled set. Furthermore, we are able to achieve WERs 1.7%/3.4% on the\nclean/noisy LibriSpeech test sets by using the unlab-60k subset of LibriLight\nas the unlabeled set for LibriSpeech 960h. We are thus able to improve upon the\nprevious state-of-the-art clean/noisy test WERs achieved on LibriSpeech 100h\n(4.74%/12.20%) and LibriSpeech (1.9%/4.1%).\n

References

Page 1

	Year	Citations

Page 1