Publication | Closed Access
Speaker adaptation of context dependent deep neural networks
231
Citations
22
References
2013
Year
Unknown Venue
EngineeringMachine LearningWeight DecaySpeech RecognitionL2 RegularizationSpeaker DiarizationRobust Speech RecognitionVoice RecognitionSpeaker AdaptationHealth SciencesComputer ScienceDeep LearningSpeech CommunicationDeep Neural NetworksMulti-speaker Speech RecognitionSpeech ProcessingSpeech InputSpeech PerceptionLinguisticsSpeaker Recognition
There has been little work on examining how deep neural networks may be adapted to speakers for improved speech recognition accuracy. Past work has examined using a discriminatively trained affine transformation of the input features applied at a frame level or the re-training of the entire shallow network for a specific speaker. This work explores how deep neural networks may be adapted to speakers by re-training the input layer, the output layer or the entire network. We look at how L2 regularization using weight decay to the speaker independent model improves generalization. Other training factors are examined including the role momentum plays and stochastic mini-batch versus batch training. While improvements are significant for smaller networks, the largest show little gain from adaptation on a large vocabulary mobile speech recognition task.
| Year | Citations | |
|---|---|---|
Page 1
Page 1