Cohort selection and word grammar effects for speaker recognition

Abstract

Automatic speaker recognition systems are maturing and databases have been designed to specifically compare algorithms and results to target error rates. The LDC YOHO speaker verification database was designed to test error rates at the 1% false rejection and 0.1% false acceptance level. This work examines the use of speaker-dependent (SD) monophone models to meet these requirements. By representing each speaker with 22 monophones, both closed-set speaker identification and global-threshold verification was performed. Using four combination lock phrases, speaker identification error rates are obtained at 0.19% for males and 0.31% for females. By defining a test hypothesis, a critical error analysis for speaker verification is developed and new results reported for YOHO. A new Bhattacharyya distance is developed for cohort selection. This method, based on the second order statistics of the enrolment Viterbi log-likelihoods, determines the optimal cohorts and achieves an equal error rate of 0.282%.

References

Page 1

	Year	Citations

Page 1