Blind Speech Separation and Enhancement With GCC-NMF

Abstract

We present a blind source separation algorithm named GCC-NMF that combines unsupervised dictionary learning via non-negative matrix factorization (NMF) with spatial localization via the generalized cross correlation (GCC) method. Dictionary learning is performed on the mixture signal, with separation subsequently achieved by grouping dictionary atoms, at each point in time, according to their spatial origins. The resulting source separation algorithm is simple yet flexible, requiring no prior knowledge or information. Separation quality is evaluated for three tasks using stereo recordings from the publicly available SiSEC signal separation evaluation campaign: 3 and 4 concurrent speakers in reverberant environments, speech mixed with real-world background noise, and noisy recordings of a moving speaker. Performance is quantified using perceptually motivated and SNR-based measures with the PEASS and BSS Eval toolkits, respectively. We evaluate the effects of model parameters on separation quality, and compare our approach with other unsupervised and semi-supervised speech separation and enhancement approaches. We show that GCC-NMF is a flexible source separation algorithm, outperforming task-specific approaches in each of the three settings, including both blind as well as several informed approaches that require prior knowledge or information.

References

Page 1

	Year	Citations
Learning the parts of objects by non-negative matrix factorization Daniel D. Lee, H. Sebastian Seung Nature Non-negative Matrix FactorizationMachine VisionImage AnalysisData ScienceMachine Learning	1999	13.8K
Algorithms for Non-negative Matrix Factorization Daniel D. Lee, H. Sebastian Seung	2000	5.5K
Some Experiments on the Recognition of Speech, with One and with Two Ears E. Colin Cherry The Journal of the Acoustical Society of America EngineeringSpeech AnalysisPhoneticsSpeech SignalsNoise	1953	4.5K
The generalized correlation method for estimation of time delay C. Knapp, G. Carter IEEE Transactions on Acoustics Speech and Signal Processing Array ProcessingStatistical Signal ProcessingEngineeringSensor Signal ProcessingSensor Array	1976	4.3K
Performance measurement in blind audio source separation Emmanuel Vincent, Rémi Gribonval, Cédric Févotte IEEE Transactions on Audio Speech and Language Processing Source SeparationEngineeringHealth SciencesTrue Source PartAudio Signal Processing	2006	2.9K
Algorithms for Nonnegative Matrix Factorization with the β-Divergence Cédric Févotte, Jérôme Idier Neural Computation Search OptimizationLow-rank ApproximationEngineeringMachine LearningMatrix Factorization	2011	626
Nonnegative matrix factorization for spectral data analysis V. Paúl Pauca, Jonathan Piper, Robert J. Plemmons Linear Algebra and its Applications Spectral TheoryEngineeringMatrix FactorizationData ScienceData Mining	2005	589
Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model Ngoc Q. K. Duong, Emmanuel Vincent, Rémi Gribonval IEEE Transactions on Audio Speech and Language Processing MusicAeroacousticsSource SeparationEngineeringSpeech Recognition	2010	425
Single-channel speech separation using sparse non-negative matrix factorization Mikkel N. Schmidt, Rasmus Kongsgaard Olsson Source SeparationEngineeringMachine LearningSpeech RecognitionData Science	2006	374
Subjective and Objective Quality Assessment of Audio Source Separation Valentin Emiya, Emmanuel Vincent, Niklas Harlander, IEEE Transactions on Audio Speech and Language Processing MusicSource SeparationEngineeringSound QualityObjective Quality Assessment	2011	283

Page 1