Publication | Closed Access
Lexical normalization for social media text
187
Citations
55
References
2013
Year
EngineeringProbable Correction CandidateCorpus LinguisticsText MiningSpeech RecognitionApplied LinguisticsNatural Language ProcessingInformation RetrievalData ScienceCorrection CandidatesComputational LinguisticsLanguage EngineeringWord SimilarityLanguage StudiesContent AnalysisMachine TranslationNlp TaskKnowledge DiscoveryDistributional SemanticsText NormalizationLexical ResourceLexical NormalizationLexical Complexity PredictionText ProcessingLinguistics
Twitter provides access to large volumes of data in real time, but is notoriously noisy, hampering its utility for NLP. In this article, we target out-of-vocabulary words in short text messages and propose a method for identifying and normalizing lexical variants. Our method uses a classifier to detect lexical variants, and generates correction candidates based on morphophonemic similarity. Both word similarity and context are then exploited to select the most probable correction candidate for the word. The proposed method doesn't require any annotations, and achieves state-of-the-art performance over an SMS corpus and a novel dataset based on Twitter.
| Year | Citations | |
|---|---|---|
Page 1
Page 1