Publication | Closed Access
A lognormal tied mixture model of pitch for prosody based speaker recognition
85
Citations
3
References
1997
Year
Unknown Venue
MusicSpeech SciencesEngineeringSpeech KinematicsPhonologyAcoustic ModelingSpeech RecognitionData SciencePitch TracksSpeaker IdentificationSpeaker DiarizationRobust Speech RecognitionVoice RecognitionAcoustic AnalysisPitch StatisticsHealth SciencesComputer ScienceSignal ProcessingSpeech CommunicationSpeech TechnologyPitch TrackerVoiceMulti-speaker Speech RecognitionSpeech AcousticsSpeech ProcessingStatistical InferenceSpeech PerceptionSpeaker Recognition
Statistics of pitch have recently been used in speaker recognition systems with good results. The success of such systems depends on robust and accurate computation of pitch statistics in the presence of pitch tracking errors. In this work, we develop a statistical model of pitch that allows unbiased estimation of pitch statistics from pitch tracks which are subject to doubling and/or halving. We first argue by a simple correlation model and empirically demonstrate by QQ plots that “clean” pitch is distributed with a lognormal distribution rather than the often assumed normal distribution. Second, we present a probabilistic model for estimated pitch via a pitch tracker in the presence of doubling/halving, which leads to a mixture of three lognormal distributions with tied means and variances for a total of four free parameters. We use the obtained pitch statistics as features in speaker verification on the March 1996 NIST Speaker Recognition Evaluation data (subset of Switchboard) and report results on the most difficult portion of the database: the “one-session” condition with males only for both the claimant and imposter speakers. Pitch statistics provide 22% reduction in false alarm rate at 1% miss rate and 11% reduction in false alarm rate at 10% miss rate over the cepstrum-only system.
| Year | Citations | |
|---|---|---|
Page 1
Page 1