Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition

Abstract

It is well-known in machine learning that multitask learning (MTL) can help improve the generalization performance of singly learning tasks if the tasks being trained in parallel are related, especially when the amount of training data is relatively small. In this paper, we investigate the estimation of triphone acoustic models in parallel with the estimation of trigrapheme acoustic models under the MTL framework using deep neural network (DNN). As triphone modeling and trigrapheme modeling are highly related learning tasks, a better shared internal representation (the hidden layers) can be learned to improve their generalization performance. Experimental evaluation on three low-resource South African languages shows that triphone DNNs trained by the MTL approach perform significantly better than triphone DNNs that are trained by the single-task learning (STL) approach by ~3-13%. The MTL-DNN triphone models also outperform the ROVER result that combines a triphone STL-DNN and a trigrapheme STL-DNN.

References

Page 1

	Year	Citations

Page 1