Concepedia

Abstract

This paper considers acoustic beamforming for noise robust automatic speech recognition (ASR). A beamformer attenuates background noise by enhancing sound components coming from a direction specified by a steering vector. Hence, accurate steering vector estimation is paramount for successful noise reduction. Recently, a beamforming approach was proposed that employs time-frequency masks. In the speech recognition system we submitted to the CHiME-3 Challenge, we employed a new form of this approach that uses a speech spectral model based on a complex Gaussian mixture model (CGMM) to estimate the time-frequency masks and the steering vector without providing technical details. This paper elaborates on this technique and examines its effectiveness for ASR. Experimental results show that the CGMM-based approach outperforms a recently proposed mask estimator based on a Watson mixture model. In addition, the CGMM-based approach is extended to an online speech enhancement scenario, which allows this technique to be used in an online recognition setup. This online version reduces the CHiME-3 evaluation error rate from 15.60% to 8.47%, which is a comparable improvement to that obtained by batch processing.

References

YearCitations

Page 1