Publication | Closed Access
Integrating time alignment and neural networks for high performance continuous speech recognition
89
Citations
9
References
1991
Year
Unknown Venue
Speech SciencesMachine LearningNeural Networks (Machine Learning)EngineeringNeural NetworkSpeech RecognitionData ScienceRobust Speech RecognitionAutomatic RecognitionNeural Network ClassifiersReal-time LanguageTime AlignmentSpeech Signal AnalysisHealth SciencesComputer ScienceNeural Networks (Computational Neuroscience)Neural NetworksDistant Speech RecognitionSignal ProcessingSpeech CommunicationSpeech TechnologyMulti-speaker Speech RecognitionSpeech AcousticsDynamic ProgrammingSpeech ProcessingSpeech Input
The authors describe two systems in which neural network classifiers are merged with dynamic programming (DP) time alignment methods to produce high-performance continuous speech recognizers. One system uses the connectionist Viterbi-training (CVT) procedure, in which a neural network with frame-level outputs is trained using guidance from a time alignment procedure. The other system uses multi-state time-delay neural networks (MS-TDNNs), in which embedded DP time alignment allows network training with only word-level external supervision. The CVT results on the, TI Digits are 99.1% word accuracy and 98.0% string accuracy. The MS-TDNNs are described in detail, with attention focused on their architecture, the training procedure, and results of applying the MS-TDNNs to continuous speaker-dependent alphabet recognition: on two speakers, word accuracy is respectively 97.5% and 89.7%.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
| Year | Citations | |
|---|---|---|
Page 1
Page 1