Publication | Closed Access
Syllable-based large vocabulary continuous speech recognition
124
Citations
16
References
2001
Year
EngineeringSpoken Language ProcessingPhonologySpeech RecognitionNatural Language ProcessingFundamental Acoustic UnitPhoneticsComputational LinguisticsRobust Speech RecognitionVoice RecognitionLanguage StudiesTriphone SystemComputer ScienceSpeech CommunicationSpeech TechnologySyllable UnitSpeech ProcessingSpeech InputSpeech PerceptionLinguistics
Most large vocabulary continuous speech recognition (LVCSR) systems in the past decade have used a context-dependent (CD) phone as the fundamental acoustic unit. We present one of the first robust LVCSR systems that uses a syllable-level acoustic unit for LVCSR on telephone-bandwidth speech. This effort is motivated by the inherent limitations in phone-based approaches-namely the lack of an easy and efficient way for modeling long-term temporal dependencies. A syllable unit spans a longer time frame, typically three phones, thereby offering a more parsimonious framework for modeling pronunciation variation in spontaneous speech. We present encouraging results which show that a syllable-based system exceeds the performance of a comparable triphone system both in terms of word error rate (WER) and complexity. The WER of the best syllabic system reported here is 49.1% on a standard Switchboard evaluation, a small improvement over the triphone system. We also report results on a much smaller recognition task, OGI Alphadigits, which was used to validate some of the benefits syllables offer over triphones. The syllable-based system exceeds the performance of the triphone system by nearly 20%, an impressive accomplishment since the alphadigits application consists mostly of phone-level minimal pair distinctions.
| Year | Citations | |
|---|---|---|
Page 1
Page 1