Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS'02

Abstract

While there has been a long tradition of research seeking to use prosodic features, especially pitch, in speaker recognition systems, results have generally been disappointing when such features are used in isolation and only modest improvements have been seen when used in conjunction with traditional cepstral GMM systems. In contrast, we report here on work from the JHU 2002 Summer Workshop exploring a range of prosodic features, using as testbed the 2001 NIST Extended Data task. We examined a variety of modeling techniques, such as n-gram models of turn-level prosodic features and simple vectors of summary statistics per conversation side scored by k/sup th/ nearest-neighbor classifiers. We found that purely prosodic models were able to achieve equal error rates of under 10%, and yielded significant gains when combined with more traditional systems. We also report on exploratory work on "conversational" features, capturing properties of the interaction across conversation sides, such as turn-taking patterns.

References

Page 1

	Year	Citations
Speaker Verification Using Adapted Gaussian Mixture Models Douglas A. Reynolds, Thomas F. Quatieri, R.B. Dunn Digital Signal Processing EngineeringHealth SciencesPattern RecognitionSpeaker IdentificationBiometrics	2000	4.3K
Prosody-based automatic segmentation of speech into sentences and topics Elizabeth Shriberg, Andreas Stolcke, Dilek Hakkani‐Tür, Speech Communication Natural Language ProcessingEngineeringSpeech CorpusText SegmentationComputational Linguistics	2000	420
Automatic Speaker Recognition Based on Pitch Contours Bishnu S. Atal The Journal of the Acoustical Society of America MusicReference VectorPhonologyCorpus LinguisticsSpeech Recognition	1972	289
The SuperSID project: exploiting high-level information for high-accuracy speaker recognition D.A. Reynolds, W.D. Andrews, Jessica K. Campbell, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). EngineeringMachine LearningEqual Error RatePhonologyCorpus Linguistics	2004	221
Modeling prosodic dynamics for speaker recognition André Adami, Roxana Mihăescu, D.A. Reynolds, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). Fundamental FrequencySpeaker TemplatesEngineeringProsodic DynamicsPhonology	2004	175
Modeling dynamic prosodic variation for speaker verification Kemal Sönmez, Elizabeth Shriberg, Larry Heck, EngineeringSpeech KinematicsSpeech RecognitionPattern RecognitionSpeaker Identification	1998	141
Robust prosodic features for speaker identification Michael J. Carey, E.S. Parris, H. Lloyd-Thomas, EngineeringRelative ImportanceProsodic FeaturesSpeech RecognitionRobust Prosodic Features	2002	86
A lognormal tied mixture model of pitch for prosody based speaker recognition Kemal Sönmez, Larry Heck, M. Weintraub, MusicSpeech SciencesEngineeringSpeech KinematicsPhonology	1997	85
Using prosodic and lexical information for speaker identification Frederick Weber, Linda Manganaro, Barbara Peskin, IEEE International Conference on Acoustics Speech and Signal Processing	2002	56
Using prosodic and lexical information for speaker identification Weber, Manganaro, Peskin, IEEE International Conference on Acoustics Speech and Signal Processing Lexical InformationEngineeringHealth SciencesSpeaker IdentificationSpeaker Diarization	2002	18

Page 1