Syllable-Length Acoustic Units in Large-Vocabulary Continuous Speech Recognition

Abstract

Recent research on the TIMIT corpus suggests that longerlength acoustic units are better suited for modelling coarticulation and long-term temporal dependencies in speech than conventional context-dependent phone models.However, the impressive results achieved on TIMIT [1] are yet to be reproduced on other corpora, such as read speech from the Spoken Dutch Corpus.Differences between TIMIT and the Spoken D utch Corpus data are analysed in an attempt to better understand in which conditions the use of longer-length units can be expected to result in considerable improvements in recognition accuracy.W e conclude that at least part of the improvements found with TIMIT can be explained by details o f the experimental procedure, and that longer-length left-to-right HMMs that borrow their topology from a sequence of triphones are only able to capture part of the pronunciation variation present in speech.

References

Page 1

	Year	Citations

Page 1