A robust language model incorporating a substring parser and extended n-grams

Abstract

Describes a language model for speech recognition which incorporates a substring parser (to take advantage of syntactic structure covered by a context-free grammar) and extended bigrams (to take advantage of remote dependencies between words). The use of extended bigrams significantly reduces the perplexity and a distribution clustering algorithm alleviates the additional storage cost. The substring parser is the foundation for training and scoring procedures based on paths at all levels through the syntactic structures, with subtrees linked by bigrams. The word bigram score is therefore absorbed into a grammar framework, consolidating the two kinds of language model, and again a significant reduction in perplexity is observed. The aim is an integrated, robust language model that is adaptive to the speaker.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>

References

Page 1

	Year	Citations

Page 1