Publication | Open Access
Hierarchical Phoneme Classification for Improved Speech Recognition
17
Citations
27
References
2021
Year
EngineeringSpoken Language ProcessingSpeech RecognitionNatural Language ProcessingData SciencePattern RecognitionPhoneticsHierarchical PhonemeRobust Speech RecognitionVoice RecognitionLanguage StudiesTimit DatabaseHierarchical Phoneme ClassificationSpeech CommunicationLanguage RecognitionSpeech ProcessingSpeech InputSpeech PerceptionLinguistics
Speech recognition consists of converting input sound into a sequence of phonemes, then finding text for the input using language models. Therefore, phoneme classification performance is a critical factor for the successful implementation of a speech recognition system. However, correctly distinguishing phonemes with similar characteristics is still a challenging problem even for state-of-the-art classification methods, and the classification errors are hard to be recovered in the subsequent language processing steps. This paper proposes a hierarchical phoneme clustering method to exploit more suitable recognition models to different phonemes. The phonemes of the TIMIT database are carefully analyzed using a confusion matrix from a baseline speech recognition model. Using automatic phoneme clustering results, a set of phoneme classification models optimized for the generated phoneme groups is constructed and integrated into a hierarchical phoneme classification method. According to the results of a number of phoneme classification experiments, the proposed hierarchical phoneme group models improved performance over the baseline by 3%, 2.1%, 6.0%, and 2.2% for fricative, affricate, stop, and nasal sounds, respectively. The average accuracy was 69.5% and 71.7% for the baseline and proposed hierarchical models, showing a 2.2% overall improvement.
| Year | Citations | |
|---|---|---|
Page 1
Page 1