Publication | Closed Access
M-vectors: Sub-band Based Energy Modulation Features for Multi-stream Automatic Speech Recognition
16
Citations
15
References
2019
Year
Unknown Venue
EngineeringEnergy Modulation FeaturesSpeech RecognitionSpeech CodingPattern RecognitionRobust Speech RecognitionAutomatic RecognitionVoice RecognitionHealth SciencesSpeech PerceptionDistant Speech RecognitionSignal ProcessingSpeech CommunicationTraditional Mfcc FeaturesAutomatic Speech RecognitionVoiceMulti-speaker Speech RecognitionSpeech ProcessingSpeech InputEnergy Modulations
In this paper, we propose a novel method to capture energy modulations from different frequency bands in speech into frame-level feature vectors, called Modulation-vectors or M-vectors, for use in Automatic Speech Recognition (ASR) systems. We show that in different multi-stream setups, with parallel streams for M-vectors and the popular Mel-frequency Cepstral Coefficient (MFCC) features, we can realize a boost in word recognition performance of end-to-end systems by ≈ 5%, and that of a monophone and triphone HMM-GMM ASR system by ≈ 18% and ≈ 16% respectively over using the traditional MFCC features.
| Year | Citations | |
|---|---|---|
Page 1
Page 1