Publication | Closed Access
Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams
129
Citations
8
References
2002
Year
Unknown Venue
EngineeringCross-lingual RepresentationTheoretical FormulationMultigram ModelSpoken Language ProcessingMultilingual PretrainingLarge Language ModelPhonologyCorpus LinguisticsText MiningSpeech RecognitionNatural Language ProcessingMemoryless SourceInformation RetrievalData ScienceComputational LinguisticsLanguage EngineeringVariable Length SequencesGrammarMachine TranslationSequence ModellingLinguisticsLanguage TechnologyComputer ScienceLanguage RecognitionSpeech ProcessingArtsLanguage ModelingSpeech Translation
The multigram model assumes that language can be described as the output of a memoryless source that emits variable-length sequences of words. The estimation of the model parameters can be formulated as a maximum likelihood estimation problem from incomplete data. We show that estimates of the model parameters can be computed through an iterative expectation-maximization algorithm and we describe a forward-backward procedure for its implementation. We report the results of a systematical evaluation of multigrams for language modeling on the ATIS database. The objective performance measure is the test set perplexity. Our results show that multigrams outperform conventional n-grams for this task.
| Year | Citations | |
|---|---|---|
Page 1
Page 1