Publication | Closed Access
Topic-Dependent Language Model with Voting on Noun History
12
Citations
28
References
2010
Year
EngineeringSpoken Language ProcessingCorpus LinguisticsText MiningWord EmbeddingsNatural Language ProcessingSpeech RecognitionInformation RetrievalData ScienceComputational LinguisticsLanguage EngineeringTopic-dependent Language ModelLanguage StudiesLanguage ModelsDistributional SemanticsAsr SystemsTopic ModelSpeech ProcessingLong-range DependenciesLinguistics
Language models (LMs) are an important field of study in automatic speech recognition (ASR) systems. LM helps acoustic models find the corresponding word sequence of a given speech signal. Without it, ASR systems would not understand the language and it would be hard to find the correct word sequence. During the past few years, researchers have tried to incorporate long-range dependencies into statistical word-based n -gram LMs. One of these long-range dependencies is topic. Unlike words, topic is unobservable. Thus, it is required to find the meanings behind the words to get into the topic. This research is based on the belief that nouns contain topic information. We propose a new approach for a topic-dependent LM, where the topic is decided in an unsupervised manner. Latent Semantic Analysis (LSA) is employed to reveal hidden (latent) relations among nouns in the context words. To decide the topic of an event, a fixed size word history sequence (window) is observed, and voting is then carried out based on noun class occurrences weighted by a confidence measure. Experiments were conducted on an English corpus and a Japanese corpus: The Wall Street Journal corpus and Mainichi Shimbun (Japanese newspaper) corpus. The results show that our proposed method gives better perplexity than the comparative baselines, including a word-based/class-based n -gram LM, their interpolated LM, a cache-based LM, a topic-dependent LM based on n -gram, and a topic-dependent LM based on Latent Dirichlet Allocation (LDA). The n -best list rescoring was conducted to validate its application in ASR systems.
| Year | Citations | |
|---|---|---|
Page 1
Page 1