Publication | Open Access
Getting more mileage from web text sources for conversational speech language modeling using class-dependent mixtures
135
Citations
16
References
2003
Year
Unknown Venue
EngineeringMachine LearningSpeech CorpusSpoken Language ProcessingSpoken Dialog SystemMultilingual PretrainingConversational SpeechCorpus LinguisticsText MiningSpeech RecognitionNatural Language ProcessingData ScienceComputational LinguisticsConversational Speech LanguageConversation AnalysisClass-dependent MixturesLanguage StudiesMachine TranslationTraining DataSpeech CommunicationSpeech TechnologySpeech AnalysisLanguage RecognitionSpeech ProcessingSpeech InputWeb Text SourcesLanguage ModelingLinguistics
Sources of training data suitable for language modeling of conversational speech are limited. In this paper, we show how training data can be supplemented with text from the web filtered to match the style and/or topic of the target recognition task, but also that it is possible to get bigger performance gains from the data by using class-dependent interpolation of N-grams.
| Year | Citations | |
|---|---|---|
Page 1
Page 1