Publication | Closed Access
Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
36
Citations
28
References
2020
Year
Unknown Venue
EngineeringMachine LearningSpoken Language ProcessingLanguage ProcessingSpeech RecognitionNatural Language ProcessingData ScienceRobust Speech RecognitionAutomatic RecognitionVoice RecognitionLanguage ModelsSpontaneous Mandarin SpeechMbr TrainingHealth SciencesClinical LanguageComputer ScienceMinimum Bayes RiskDistant Speech RecognitionSpeech CommunicationMulti-speaker Speech RecognitionSpeech AcousticsSpeech ProcessingSpeech InputEnd-to-end Speech RecognitionSpeech Perception
In this work, we propose minimum Bayes risk (MBR) training of RNN-Transducer (RNN-T) for end-to-end speech recognition.Specifically, initialized with a RNN-T trained model, MBR training is conducted via minimizing the expected edit distance between the reference label sequence and on-thefly generated N-best hypothesis.We also introduce a heuristic to incorporate an external neural network language model (NNLM) in RNN-T beam search decoding and explore MBR training with the external NNLM.Experimental results demonstrate an MBR trained model outperforms a RNN-T trained model substantially and further improvements can be achieved if trained with an external NNLM.Our best MBR trained system achieves absolute character error rate (CER) reductions of 1.2% and 0.5% on read and spontaneous Mandarin speech respectively over a strong convolution and transformer based RNN-T baseline trained on ∼21,000 hours of speech.
| Year | Citations | |
|---|---|---|
Page 1
Page 1