Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model

TLDR

Reverberant recording environments are modeled for under‑determined convolutive blind source separation. Each source is represented as a zero‑mean Gaussian whose covariance captures spatial characteristics; four covariance models, including a full‑rank unconstrained one, are considered, and EM algorithms with tailored initialization and source‑ordering procedures are derived. Experiments on synthetic reverberant mixtures and live speech recordings demonstrate the effectiveness of the proposed approach.

Abstract

This paper addresses the modeling of reverberant recording environments in the context of under-determined convolutive blind source separation. We model the contribution of each source to all mixture channels in the time-frequency domain as a zero-mean Gaussian random variable whose covariance encodes the spatial characteristics of the source. We then consider four specific covariance models, including a full-rank unconstrained model. We derive a family of iterative expectation-maximization (EM) algorithms to estimate the parameters of each model and propose suitable procedures adapted from the state-of-the-art to initialize the parameters and to align the order of the estimated sources across all frequency bins. Experimental results over reverberant synthetic mixtures and live recordings of speech data show the effectiveness of the proposed approach.

References

Page 1

	Year	Citations

Page 1