Publication | Closed Access
Learnable MFCCs for Speaker Verification
17
Citations
21
References
2021
Year
Unknown Venue
Learnable MfccsEngineeringMachine LearningBiometricsStandard Mfcc ExtractorSpeech RecognitionStatic MfccsSpeaker DiarizationRobust Speech RecognitionVoice RecognitionHealth SciencesComputer EngineeringComputer ScienceDeep LearningDeep Neural NetworkSignal ProcessingDistant Speech RecognitionSpeech CommunicationMulti-speaker Speech RecognitionSpeech ProcessingSpeech PerceptionSpeaker Recognition
We propose a learnable mel-frequency cepstral coefficients (MFCCs) front-end architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven version of four linear transforms in a standard MFCC extractor - windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7% (VoxCeleb1) and 9.7% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort.
| Year | Citations | |
|---|---|---|
Page 1
Page 1