Support vector machines using GMM supervectors for speaker verification

TLDR

Gaussian mixture models are highly effective for text‑independent speaker recognition, are normally trained by MAP adaptation of mixture means, and recent work has stacked these means into supervectors to address speaker and channel variability. The study investigates using the GMM supervector as input to a support vector machine classifier. Two novel SVM kernels based on distance metrics between GMM models are introduced. The new kernels achieve excellent classification accuracy in the NIST speaker recognition evaluation.

Abstract

Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker recognition. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. Recent methods in compensation for speaker and channel variability have proposed the idea of stacking the means of the GMM model to form a GMM mean supervector. We examine the idea of using the GMM supervector in a support vector machine (SVM) classifier. We propose two new SVM kernels based on distance metrics between GMM models. We show that these SVM kernels produce excellent classification accuracy in a NIST speaker recognition evaluation task.

References

Page 1

	Year	Citations

Page 1