End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

Abstract

We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes. The alignment between the input and output sequences is established using an attention mechanism: the decoder emits each symbol based on a context created with a subset of input symbols elected by the attention mechanism. We report initial results demonstrating that this new approach achieves phoneme error rates that are comparable to the state-of-the-art HMM-based decoders, on the TIMIT dataset.

References

Page 1

	Year	Citations
Gradient-based learning applied to document recognition Yann LeCun, Léon Bottou, Yoshua Bengio, Proceedings of the IEEE EngineeringMachine LearningMultilayer Neural NetworksImage AnalysisData Science	1998	56.5K
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau arXiv (Cornell University) Natural Language ProcessingComputer-assisted TranslationStructured PredictionSequence ModellingEngineering	2014	14.6K
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups Geoffrey E. Hinton, Li Deng, Dong Yu, IEEE Signal Processing Magazine EngineeringMachine LearningAcoustic ModelingSpeech RecognitionData Science	2012	10.2K
Bidirectional recurrent neural networks Mike Schuster, Kuldip K. Paliwal IEEE Transactions on Signal Processing Natural Language ProcessingStructured PredictionConditional Posterior ProbabilityEngineeringMachine Learning	1997	9.6K
Speech recognition with deep recurrent neural networks Alex Graves, Abdelrahman Mohamed, Geoffrey E. Hinton Natural Language ProcessingDeep Neural NetworksRnn PerformanceMachine LearningEngineering	2013	8.7K
Improving neural networks by preventing co-adaptation of feature detectors Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, arXiv (Cornell University) Feature DetectorLarge Ai ModelConvolutional Neural NetworkMachine VisionMachine Learning	2012	6.6K
ADADELTA: An Adaptive Learning Rate Method Matthew D. Zeiler arXiv (Cornell University) Incremental LearningGradient DescentEngineeringMachine LearningSpeech Recognition	2012	5.5K
Connectionist temporal classification Alex Graves, Santiago Fernández, Faustino Gomez, EngineeringMachine LearningSpoken Language ProcessingRecurrent Neural NetworkSpeech Recognition	2006	5.3K
Kaldi Speech Recognition Toolkit Daniel Povey Infoscience (Ecole Polytechnique Fédérale de Lausanne)	2024	4.9K
On the difficulty of training Recurrent Neural Networks Razvan Pascanu, Tomáš Mikolov, Yoshua Bengio arXiv (Cornell University) Vanishing Gradients ProblemGeometric LearningEngineeringMachine LearningSequential Learning	2012	3.8K

Page 1