Publication | Closed Access
Discriminative autoencoders for speaker verification
15
Citations
28
References
2017
Year
Unknown Venue
Identity CodesEngineeringMachine LearningHealth SciencesPattern RecognitionMulti-speaker Speech RecognitionBiometricsSpeaker DiarizationRobust Speech RecognitionDiscriminative AutoencodersSpeech ProcessingSpeaker DiscriminationNeural NetworksVoice RecognitionSpeech PerceptionSpeech CommunicationSpeaker RecognitionSpeech Recognition
This paper presents a learning and scoring framework based on neural networks for speaker verification. The framework employs an autoencoder as its primary structure while three factors are jointly considered in the objective function for speaker discrimination. The first one, relating to the sample reconstruction error, makes the structure essentially a generative model, which benefits to learn most salient and useful properties of the data. Functioning in the middlemost hidden layer, the other two attempt to ensure that utterances spoken by the same speaker are mapped into similar identity codes in the speaker discriminative subspace, where the dispersion of all identity codes are maximized to some extent so as to avoid the effect of over-concentration. Finally, the decision score of each utterance pair is simply computed by cosine similarity of their identity codes. Dealing with utterances represented by i-vectors, the results of experiments conducted on the male portion of the core task in the NIST 2010 Speaker Recognition Evaluation (SRE) significantly demonstrate the merits of our approach over the conventional PLDA method.
| Year | Citations | |
|---|---|---|
Page 1
Page 1