Publication | Open Access
POS Tagging of English-Hindi Code-Mixed Social Media Content
201
Citations
19
References
2014
Year
Unknown Venue
EngineeringTaggingPart-of-speech TaggingCorpus LinguisticsText MiningApplied LinguisticsNatural Language ProcessingSocial MediaComputational LinguisticsLanguage StudiesContent AnalysisMachine TranslationLanguage TechnologySocial Multimedia TaggingPos TaggingSemantic TaggingFormal GrammarText ProcessingLinguisticsPo Tagging
Code-mixing is frequently observed in user generated content on social media, especially from multilingual users. The linguistic complexity of such content is compounded by presence of spelling variations, transliteration and non-adherance to formal grammar. We describe our initial efforts to create a multi-level annotated corpus of Hindi-English codemixed text collated from Facebook forums, and explore language identification, back-transliteration, normalization and POS tagging of this data. Our results show that language identification and transliteration for Hindi are two major challenges that impact POS tagging accuracy.
| Year | Citations | |
|---|---|---|
Page 1
Page 1