Publication | Closed Access
Auditory Teager energy cepstrum coefficients for robust speech recognition
63
Citations
10
References
2005
Year
Unknown Venue
Mfcc BaselineEngineeringSpeech RecognitionData SciencePattern RecognitionNoiseRobust Speech RecognitionAudio Signal AnalysisAutomatic RecognitionAcoustic AnalysisSpeech Signal AnalysisHealth SciencesDistant Speech RecognitionSignal ProcessingSpeech CommunicationSpeech TechnologySpeech AcousticsFeature Extraction AlgorithmSpeech ProcessingSpeech InputSpeech Perception
In this paper, a feature extraction algorithm for robust speech recognition is introduced. The feature extraction algorithm is motivated by the human auditory processing and the nonlinear Teager-Kaiser energy operator that estimates the true energy of the source of a resonance. The proposed features are labeled as Teager Energy Cepstrum Coefficients (TECCs). TECCs are computed by first filtering the speech signal through a dense non constant-Q Gammatone filterbank and then by estimating the “true” energy of the signal’s source, i.e., the short-time average of the output of the Teager-Kaiser energy operator. Error analysis and speech recognition experiments show that the TECCs and the mel frequency cepstrum coefficients (MFCCs) perform similarly for clean recording conditions; while the TECCs perform significantly better than the MFCCs for noisy recognition tasks. Specifically, relative word error rate improvement of 60% over the MFCC baseline is shown for the Aurora-3 database for the high-mismatch condition. Absolute error rate improvement ranging from 5% to 20% is shown for a phone recognition task in (various types of additive) noise.
| Year | Citations | |
|---|---|---|
Page 1
Page 1