Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data

Abstract

The lack of code-switch training data is one of the major concerns in the\ndevelopment of end-to-end code-switching automatic speech recognition (ASR)\nmodels. In this work, we propose a method to train an improved end-to-end\ncode-switching ASR using only monolingual data. Our method encourages the\ndistributions of output token embeddings of monolingual languages to be\nsimilar, and hence, promotes the ASR model to easily code-switch between\nlanguages. Specifically, we propose to use Jensen-Shannon divergence and cosine\ndistance based constraints. The former will enforce output embeddings of\nmonolingual languages to possess similar distributions, while the later simply\nbrings the centroids of two distributions to be close to each other.\nExperimental results demonstrate high effectiveness of the proposed method,\nyielding up to 4.5% absolute mixed error rate improvement on Mandarin-English\ncode-switching ASR task.\n

References

Page 1

	Year	Citations

Page 1