Training Universal Background Models for Speaker Recognition

Abstract

Universal background models (UBM) in speaker recognition systems are typically Gaussian mixture models (GMM) trained from a large amount of data using the maximum likelihood criterion. This paper investigates three alternative crit eria for training the UBM. In the first, we cluster an existing automat ic speech recognition (ASR) acoustic model to generate the UBM. In each of the other two, we use statistics based on the speaker labels of the development data to regularize the maximum likelihood objective function in training the UBM. We present an iterative algorithm similar to the expectation maximization (EM) algorithm to train the UBM for each of these regularized maximum likelihood criteria. We present several experiments that show how combining only two systems outperforms the best published results on the English telephone tasks of the NIST 2008 speaker recognition evaluation. Improved user security in speech-driven telephony applications can be achieved with automatic speaker verification. Curren t automatic speaker verification systems face significant cha llenges caused by adverse acoustic conditions. Telephone band limitation, channel/transducer variability, as well as na tural speech variability have a negative impact on the performance of speaker verification systems. Degradation in the performan ce of these systems due to inter-session variability has been o ne of the main challenges to the deployment of speaker verificatio n technologies. We investigate how integrating more information about the development and test sets into the speaker recognition system may improve its performance and robustness. In this work, we propose two main approaches for training the UBM. In the first, the UBM is constructed by using the Kullback-Leibler (KL) distance as a measure for clustering the Gaussian components of an ASR acoustic model. This approach attempts to exploit the context-dependent phonetic information of the ASR acoustic model in estimating the UBM parameters. Subsequently, this method is called the phonetically inspired UBM (PIUBM) approach. The approach is motivated by the fact that many of the speaker characteristics are conditioned on some phonetic units or phonetic classes and therefore may be better modeled using a UBM trained with an explicit

References

Page 1

	Year	Citations

Page 1