Publication | Closed Access
Modelling speaker and channel variability using deep neural networks for robust speaker verification
32
Citations
15
References
2016
Year
Unknown Venue
Channel VariabilityEngineeringMachine LearningBiometricsEqual Error RateSpeech RecognitionPattern RecognitionSpeaker DiarizationRobust Speech RecognitionPlda BaselineRobust Speaker VerificationComputer ScienceDeep LearningDistant Speech RecognitionDeep Neural NetworkSpeech CommunicationDeep Neural NetworksMulti-speaker Speech RecognitionSpeech ProcessingSpeaker Recognition
We propose to improve the performance of i-vector based speaker verification by processing the i-vectors with a deep neural network before they are fed to a cosine distance or probabilistic linear discriminant analysis (PLDA) classifier. To this end we build on an existing model that we refer to as Non-linear Within Class Normalization (NWCN) and introduce a novel Speaker Classifier Network (SCN). Both models deliver impressive speaker verification performance, showing a 56% and 68% relative improvement over standard i-vectors when combined with a cosine distance backend. The NWCN model also reduces the equal error rate for PLDA from 1.78% to 1.63%. We also test these models under the constraints of domain mismatch, i.e. when no in-domain training data is available. Under these conditions, SCN features in combination with cosine distance performs better than the PLDA baseline, achieving an equal error rate of 2.92% as compared to 3.37%.
| Year | Citations | |
|---|---|---|
Page 1
Page 1