Publication | Closed Access
Single and multi-channel approaches for distant speech recognition under noisy reverberant conditions: I2R'S system description for the ASpIRE challenge
13
Citations
13
References
2015
Year
Unknown Venue
Infocomm ResearchEngineeringMachine LearningWord Error RateSpeech RecognitionData ScienceNoiseRobust Speech RecognitionVoice RecognitionAspire ChallengeHealth SciencesComputer EngineeringComputer ScienceDeep LearningI2 RMulti-channel ApproachesSignal ProcessingDistant Speech RecognitionSpeech CommunicationVoiceMulti-speaker Speech RecognitionSpeech SeparationSpeech ProcessingSpeech InputSpeech Perception
In this paper, we introduce the system developed at the Institute for Infocomm Research (I2 R) for the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. The main components of the system are a front-end processing system consisting of a distributed beam-forming algorithm, that performs adaptive weighting and channel elimination, a speech dereverberation approach using a maximum-kurtosis criteria, and a robust voice activity detection (VAD) module based on using the sub-harmonic ratio (SHR). The acoustic back-end consists of a multi-conditional Deep Neural Network (DNN) model that uses speaker adapted features combined with a decoding strategy that performs semi-supervised DNN model adaptation using weighted labels generated by the first-pass decoding output. On the single-microphone evaluation, our system achieved a word error rate (WER) of 44.8%. With the incorporation of beamforming on the multi-microphone evaluation, our system achieved an improvement in WER of over 6% to give the best evaluation result of 38.5%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1