Publication | Closed Access
Application of Convolutional Neural Networks to Language Identification in Noisy Conditions
46
Citations
21
References
2014
Year
Unknown Venue
Convolutional Neural NetworkEngineeringMachine LearningSpoken Language ProcessingSpeech RecognitionNatural Language ProcessingData SciencePattern RecognitionComputational LinguisticsRobust Speech RecognitionVoice RecognitionNoisy ConditionsLanguage StudiesDeep LearningSpeech CommunicationAutomatic Speech RecognitionMulti-speaker Speech RecognitionConvolutional Neural NetworksLanguage RecognitionPosterior ProbabilitiesSpeech ProcessingSpeech InputSpeech PerceptionLinguistics
This paper proposes two novel frontends for robust language identification (LID) using a convolutional neural network (CNN) trained for automatic speech recognition (ASR). In the CNN/i-vector frontend, the CNN is used to obtain the posterior probabilities for i-vector training and extraction instead of a universal background model (UBM). The CNN/posterior frontend is somewhat similar to a phonetic system in that the occupation counts of (tied) triphone states (senones) given by the CNN are used for classification. They are compressed to a low dimensional vector using probabilistic principal component analysis (PPCA). Evaluated on heavily degraded speech data, the proposed front ends provide significant improvements of up to 50% on average equal error rate compared to a UBM/i-vector baseline. Moreover, the proposed frontends are complementary and give significant gains of up to 20% relative to the best single system when combined.
| Year | Citations | |
|---|---|---|
Page 1
Page 1