Publication | Open Access
Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition
57
Citations
15
References
2016
Year
EngineeringMachine LearningSpoken Language ProcessingMultilingual PretrainingRecurrent Neural NetworkSpeech RecognitionNatural Language ProcessingData ScienceRobust Speech RecognitionCtc LossAcoustic-to-word Lstm ModelReal-time LanguageNeural Speech RecognizerMachine TranslationHealth SciencesCtc Word ModelsDeep LearningSpeech CommunicationSpeech ProcessingSpeech InputSpeech PerceptionLinguistics
We present results that show it is possible to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with whole words as acoustic units. We model the output vocabulary of about 100,000 words directly using deep bi-directional LSTM RNNs with CTC loss. The model is trained on 125,000 hours of semi-supervised acoustic training data, which enables us to alleviate the data sparsity problem for word models. We show that the CTC word models work very well as an end-to-end all-neural speech recognition model without the use of traditional context-dependent sub-word phone units that require a pronunciation lexicon, and without any language model removing the need to decode. We demonstrate that the CTC word models perform better than a strong, more complex, state-of-the-art baseline with sub-word units.
| Year | Citations | |
|---|---|---|
Page 1
Page 1