Publication | Closed Access
A robust language model incorporating a substring parser and extended n-grams
10
Citations
5
References
2002
Year
Unknown Venue
Syntactic ParsingEngineeringRobust Language ModelSpoken Language ProcessingCorpus LinguisticsLanguage ProcessingSpeech RecognitionNatural Language ProcessingSyntaxComputational LinguisticsLanguage EngineeringGrammarLanguage StudiesSpoken Language UnderstandingMachine TranslationLanguage Modeling (Natural Language Processing)Computer ScienceShallow ParsingTreebanksLanguage RecognitionSpeech ProcessingLanguage Modeling (Theoretical Linguistics)Remote DependenciesLinguisticsPo Tagging
Describes a language model for speech recognition which incorporates a substring parser (to take advantage of syntactic structure covered by a context-free grammar) and extended bigrams (to take advantage of remote dependencies between words). The use of extended bigrams significantly reduces the perplexity and a distribution clustering algorithm alleviates the additional storage cost. The substring parser is the foundation for training and scoring procedures based on paths at all levels through the syntactic structures, with subtrees linked by bigrams. The word bigram score is therefore absorbed into a grammar framework, consolidating the two kinds of language model, and again a significant reduction in perplexity is observed. The aim is an integrated, robust language model that is adaptive to the speaker.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
| Year | Citations | |
|---|---|---|
Page 1
Page 1