Publication | Closed Access
Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition
86
Citations
19
References
2014
Year
Unknown Venue
EngineeringMachine LearningSpeech RecognitionData ScienceRobust Speech RecognitionJoint Acoustic ModelingMulti-task LearningVoice RecognitionMtl FrameworkComputer ScienceLow-resource Speech RecognitionDeep LearningDistant Speech RecognitionDeep Neural NetworkSpeech CommunicationDeep Neural NetworksMulti-speaker Speech RecognitionSpeech ProcessingMultitask LearningSpeech Input
It is well-known in machine learning that multitask learning (MTL) can help improve the generalization performance of singly learning tasks if the tasks being trained in parallel are related, especially when the amount of training data is relatively small. In this paper, we investigate the estimation of triphone acoustic models in parallel with the estimation of trigrapheme acoustic models under the MTL framework using deep neural network (DNN). As triphone modeling and trigrapheme modeling are highly related learning tasks, a better shared internal representation (the hidden layers) can be learned to improve their generalization performance. Experimental evaluation on three low-resource South African languages shows that triphone DNNs trained by the MTL approach perform significantly better than triphone DNNs that are trained by the single-task learning (STL) approach by ~3-13%. The MTL-DNN triphone models also outperform the ROVER result that combines a triphone STL-DNN and a trigrapheme STL-DNN.
| Year | Citations | |
|---|---|---|
Page 1
Page 1