Acoustic modeling for Chinese speech recognition: a comparative study of Mandarin and Cantonese

Abstract

This paper presents a comparative study on automatic speech recognition for two different Chinese dialects, namely Mandarin and Cantonese. It focuses on decision-tree based context-dependent acoustic modeling for large-vocabulary continuous speech recognition. Extensive phonological and phonetic knowledge are incorporated to design questions concerning the left and right context of sub-syllable units, namely INITIALs and FINALs. This results in a set of class-triphone models for each dialect. Syllable recognition accuracy of 81.7% and 75.5% are attained for Mandarin and Cantonese respectively. Such a performance gap is accountable by various linguistic and practical reasons, including: 1) phonological and phonetic discrepancies between the two dialects; 2) design of training databases; and 3) design of phonetic questions in decision-tree clustering.

References

Page 1

	Year	Citations

Page 1