Publication | Closed Access
Duration mismatch compensation for i-vector based speaker recognition systems
122
Citations
25
References
2013
Year
Unknown Venue
EngineeringHealth SciencesDuration Mismatch CompensationSpeaker IdentificationRobust Speech RecognitionSpeech ProcessingUtterance DurationVoice RecognitionSpeech InputSpeech TechnologySpeech PerceptionLog DurationDistant Speech RecognitionSignal ProcessingDuration VariabilitySpeech CommunicationSpeaker RecognitionSpeech Recognition
Speaker recognition systems trained on long duration utterances are known to perform significantly worse when short test segments are encountered. To address this mismatch, we analyze the effect of duration variability on phoneme distributions of speech utterances and i-vector length. We demonstrate that, as utterance duration is decreased, number of detected unique phonemes and i-vector length approaches zero in a logarithmic and non-linear fashion, respectively. Assuming duration variability as an additive noise in the i-vector space, we propose three different strategies for its compensation: i) multi-duration training in Probabilistic Linear Discriminant Analysis (PLDA) model, ii) score calibration using log duration as a Quality Measure Function (QMF), and iii) multi-duration PLDA training with synthesized short duration i-vectors. Experiments are designed based on the 2012 National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE) protocol with varying test utterance duration. Experimental results demonstrate the effectiveness of the proposed schemes on short duration test conditions, especially with the QMF calibration approach.
| Year | Citations | |
|---|---|---|
Page 1
Page 1