Publication | Closed Access
Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise
188
Citations
21
References
2016
Year
Unknown Venue
EngineeringSpectrum EstimationSpeech EnhancementNoise ReductionSpeech RecognitionBeamformingData ScienceNoiseDigital BeamformingRobust Speech RecognitionSteering VectorHealth SciencesTime-frequency MasksWatson Mixture ModelComputer EngineeringMulti-channel ProcessingInverse ProblemsAcoustic BeamformingOnline/offline AsrSignal ProcessingDistant Speech RecognitionSpeech CommunicationArray ProcessingMulti-speaker Speech RecognitionSpeech ProcessingSpeech SeparationSpeech PerceptionRobust Mvdr
This paper considers acoustic beamforming for noise robust automatic speech recognition (ASR). A beamformer attenuates background noise by enhancing sound components coming from a direction specified by a steering vector. Hence, accurate steering vector estimation is paramount for successful noise reduction. Recently, a beamforming approach was proposed that employs time-frequency masks. In the speech recognition system we submitted to the CHiME-3 Challenge, we employed a new form of this approach that uses a speech spectral model based on a complex Gaussian mixture model (CGMM) to estimate the time-frequency masks and the steering vector without providing technical details. This paper elaborates on this technique and examines its effectiveness for ASR. Experimental results show that the CGMM-based approach outperforms a recently proposed mask estimator based on a Watson mixture model. In addition, the CGMM-based approach is extended to an online speech enhancement scenario, which allows this technique to be used in an online recognition setup. This online version reduces the CHiME-3 evaluation error rate from 15.60% to 8.47%, which is a comparable improvement to that obtained by batch processing.
| Year | Citations | |
|---|---|---|
Page 1
Page 1