Publication | Closed Access
Statistical language modeling using a variable context length
48
Citations
9
References
2002
Year
Unknown Venue
Natural Language ProcessingSyntaxEngineeringStatistical Language ModelsData ScienceComputational LinguisticsLinguisticsVariable Context LengthSpoken Language ProcessingGrammarDistributional SemanticsLanguage StudiesLexical Complexity PredictionLarge Language ModelCorpus LinguisticsText MiningMachine Translation
In this paper we investigate statistical language models with a variable context length. For such models the number of relevant words in a context is not fixed as in conventional M-gram models but depends on the context itself. We develop a measure for the quality of variable-length models and present a pruning algorithm for the creation of such models, based on this measure. Further we address the question how the use of a special backing-off distribution can improve the language models. Experiments were performed on two data bases, the ARPANAB corpus and the German Verbmobil corpus, respectively. The results show that variable-length models outperform conventional models of the same size. Furthermore it can be seen that if a moderate loss in performance is acceptable, the size of a language model can be reduced drastically by using the presented pruning algorithm.
| Year | Citations | |
|---|---|---|
Page 1
Page 1