A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation

TLDR

Speech enhancement and separation are core problems in audio signal processing, crucial for devices such as mobile phones, conference call systems, hands‑free systems, and hearing aids, and serve as preprocessing steps for noise‑robust automatic speech and speaker recognition; with many devices now equipped with two to eight microphones, multichannel interfaces offer greater capabilities than single‑channel ones, and research has converged from microphone array processing and blind source separation, yet a comprehensive overview of their common foundations and differences remains lacking. The paper aims to fill this gap by analyzing a large number of established and recent techniques. The analysis is performed along four axes: acoustic impulse response model, spatial filter design criterion, parameter estimation algorithm, and optional postfiltering. We conclude this overview paper by providing a list of software and data resources and by discussing perspectives and future trends in the field.

Abstract

Speech enhancement and separation are core problems in audio signal processing, with commercial applications in devices as diverse as mobile phones, conference call systems, hands-free systems, or hearing aids. In addition, they are crucial preprocessing steps for noise-robust automatic speech and speaker recognition. Many devices now have two to eight microphones. The enhancement and separation capabilities offered by these multichannel interfaces are usually greater than those of single-channel interfaces. Research in speech enhancement and separation has followed two convergent paths, starting with microphone array processing and blind source separation, respectively. These communities are now strongly interrelated and routinely borrow ideas from each other. Yet, a comprehensive overview of the common foundations and the differences between these approaches is lacking at present. In this paper, we propose to fill this gap by analyzing a large number of established and recent techniques according to four transverse axes: 1) the acoustic impulse response model, 2) the spatial filter design criterion, 3) the parameter estimation algorithm, and 4) optional postfiltering. We conclude this overview paper by providing a list of software and data resources and by discussing perspectives and future trends in the field.

References

Page 1

	Year	Citations
Maximum Likelihood from Incomplete Data Via the <i>EM</i> Algorithm A. P. Dempster, N. M. Laird, Donald B. Rubin Journal of the Royal Statistical Society Series B (Statistical Methodology) Statistical Signal ProcessingMixture DistributionEngineeringData ScienceIncompleteness	1977	49.2K
Independent component analysis, A new concept? Pierre Comon Signal Processing Source SeparationEngineeringMachine LearningData ScienceData Mining	1994	8.2K
The generalized correlation method for estimation of time delay C. Knapp, G. Carter IEEE Transactions on Acoustics Speech and Signal Processing Array ProcessingStatistical Signal ProcessingEngineeringSensor Signal ProcessingSensor Array	1976	4.3K
Adaptive noise cancelling: Principles and applications Bernard Widrow, John R. Glover, J. McCool, Proceedings of the IEEE Adaptive FilterEngineeringAdaptive Noise CancellerPrimary NoiseAdaptive Noise Cancelling	1975	3.9K
Beamforming: a versatile approach to spatial filtering B.D. Van Veen, K.M. Buckley IEEE ASSP Magazine RadarArray ProcessingSeveral Beamformer ImplementationsEngineeringSensor Array	1988	3.8K
Speech enhancement using a minimum mean-square error log-spectral amplitude estimator Y. Ephraim, D. Malah IEEE Transactions on Acoustics Speech and Signal Processing EngineeringSpeech SignalsSpeech EnhancementSpeech RecognitionSpeech Coding	1985	3.8K
Image method for efficiently simulating small-room acoustics Jont B. Allen, D. A. Berkley The Journal of the Acoustical Society of America AeroacousticsImpulse ResponseEngineeringImage MethodNoise	1979	3.7K
Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs Antony W. Rix, John G. Beerends, M. P. Hollier, EngineeringSound QualitySpeech EnhancementPerceptual EvaluationCommunication	2002	3K
Performance measurement in blind audio source separation Emmanuel Vincent, Rémi Gribonval, Cédric Févotte IEEE Transactions on Audio Speech and Language Processing Source SeparationEngineeringHealth SciencesTrue Source PartAudio Signal Processing	2006	2.9K
Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator Y. Ephraim, D. Malah IEEE Transactions on Acoustics Speech and Signal Processing Mmse EstimatorEngineeringSpeech CodingWiener EstimatorHealth Sciences	1984	2.8K

Page 1