Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator

TLDR

Speech enhancement systems rely heavily on the short‑time spectral amplitude (STSA) of speech for perceptual quality. This work proposes and derives a minimum‑mean‑square‑error STSA estimator and compares it to Wiener filtering and spectral subtraction approaches. The estimator is derived by modeling speech and noise as independent Gaussian variables, its performance is evaluated against a Wiener‑based STSA estimator, its robustness to signal‑presence uncertainty is examined, and it is combined with the noisy phase’s complex exponential to reconstruct the signal. The approach yields the MMSE estimator of the original phase without affecting STSA estimation, achieves significant noise reduction with colorless residual noise, and has computational complexity comparable to existing methods.

Abstract

This paper focuses on the class of speech enhancement systems which capitalize on the major importance of the short-time spectral amplitude (STSA) of the speech signal in its perception. A system which utilizes a minimum mean-square error (MMSE) STSA estimator is proposed and then compared with other widely used systems which are based on Wiener filtering and the "spectral subtraction" algorithm. In this paper we derive the MMSE STSA estimator, based on modeling speech and noise spectral components as statistically independent Gaussian random variables. We analyze the performance of the proposed STSA estimator and compare it with a STSA estimator derived from the Wiener estimator. We also examine the MMSE STSA estimator under uncertainty of signal presence in the noisy observations. In constructing the enhanced signal, the MMSE STSA estimator is combined with the complex exponential of the noisy phase. It is shown here that the latter is the MMSE estimator of the complex exponential of the original phase, which does not affect the STSA estimation. The proposed approach results in a significant reduction of the noise, and provides enhanced speech with colorless residual noise. The complexity of the proposed algorithm is approximately that of other systems in the discussed class.

References

Page 1

	Year	Citations

Page 1