Publication | Closed Access
Unsupervised cleansing of noisy text
65
Citations
15
References
2010
Year
Noisy SentenceEngineeringMachine LearningParallel TextCorpus LinguisticsText MiningSpeech RecognitionNatural Language ProcessingNoisy TextText SegmentationComputational LinguisticsLanguage EngineeringLanguage StudiesMachine TranslationComputer-assisted TranslationComputer ScienceNeural Machine TranslationText NormalizationSpeech ProcessingText ProcessingLinguistics
In this paper we look at the problem of cleansing noisy text using a statistical machine translation model. Noisy text is produced in informal communications such as Short Message Service (SMS), Twitter and chat. A typical Statistical Machine Translation system is trained on parallel text comprising noisy and clean sentences. In this paper we propose an unsupervised method for the translation of noisy text to clean text. Our method has two steps. For a given noisy sentence, a weighted list of possible clean tokens for each noisy token are obtained. The clean sentence is then obtained by maximizing the product of the weighted lists and the language model scores.
| Year | Citations | |
|---|---|---|
Page 1
Page 1