Publication | Closed Access
Language model adaptation using latent dirichlet allocation and an efficient topic inference algorithm
42
Citations
8
References
2007
Year
Unknown Venue
Latent Dirichlet AllocationEngineeringMultilingual PretrainingCorpus LinguisticsText MiningWord EmbeddingsNatural Language ProcessingSpeech RecognitionLda ModelLanguage Model AdaptationLatent ModelingInformation RetrievalLanguage AdaptationComputational LinguisticsLanguage StudiesStatisticsMachine TranslationNlp TaskDistributional SemanticsTopic ModelBackground Language ModelLinguistics
We present an effort to perform topic mixture-based language model adaptation using latent Dirichlet allocation (LDA).We use probabilistic latent semantic analysis (PLSA) to automatically cluster a heterogeneous training corpus, and train an LDA model using the resultant topicdocument assignments.Using this LDA model, we then construct topic-specific corpora at the utterance level for interpolation with a background language model during language model adaptation.We also present a novel iterative algorithm for LDA topic inference.Very encouraging results were obtained in preliminary experiments with broadcast news in Mandarin Chinese.
| Year | Citations | |
|---|---|---|
Page 1
Page 1