Publication | Closed Access
Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition
298
Citations
21
References
2017
Year
Unknown Venue
EngineeringMachine LearningSpoken Language ProcessingMultilingual PretrainingRecurrent Neural NetworkSpeech RecognitionNatural Language ProcessingData ScienceRobust Speech RecognitionAcoustic-to-word Lstm ModelReal-time LanguageNeural Speech RecognizerMachine TranslationHealth SciencesCtc Word ModelsDeep LearningSpeech CommunicationCtc Loss.the ModelSpeech ProcessingSpeech InputSpeech PerceptionLinguistics
We present results that show it is possible to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with whole words as acoustic units.We model the output vocabulary of about 100,000 words directly using deep bi-directional LSTM RNNs with CTC loss.The model is trained on 125,000 hours of semi-supervised acoustic training data, which enables us to alleviate the data sparsity problem for word models.We show that the CTC word models work very well as an end-to-end all-neural speech recognition model without the use of traditional context-dependent sub-word phone units that require a pronunciation lexicon, and without any language model removing the need to decode.We demonstrate that the CTC word models perform better than a strong, more complex, state-of-the-art baseline with sub-word units.
| Year | Citations | |
|---|---|---|
Page 1
Page 1