Publication | Open Access
Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters
679
Citations
31
References
2013
Year
We consider the problem of part‑of‑speech tagging for informal, online conversational text. The study evaluates large‑scale unsupervised word clustering and novel lexical features to improve POS tagging accuracy, and introduces POS annotation guidelines and a new tweet dataset. The authors employ large‑scale unsupervised word clustering, novel lexical features, and provide POS guidelines and a tweet dataset. The system attains state‑of‑the‑art accuracy, raising Twitter POS tagging from 90 % to 93 %, and the word clusters reveal insights into NLP and linguistic phenomena in online conversational text. Tagging software, guidelines, and word clusters are available at http://www.ark.cs.cmu.edu/TweetNLP, and the paper is forthcoming in NAACL 2013.
We consider the problem of part-of-speech tagging for informal, online conversational text. We systematically evaluate the use of large-scale unsupervised word clustering and new lexical features to improve tagging accuracy. With these features, our system achieves state-of-the-art tagging results on both Twitter and IRC POS tagging tasks; Twitter tagging is improved from 90% to 93% accuracy (more than 3% absolute). Qualitative analysis of these word clusters yields insights about NLP and linguistic phenomena in this genre. Additionally, we contribute the first POS annotation guidelines for such text and release a new dataset of English language tweets annotated using these guidelines. Tagging software, annotation guidelines, and large-scale word clusters are available at: http://www.ark.cs.cmu.edu/TweetNLP This paper describes release 0.3 of the “CMU Twitter Part-of-Speech Tagger” and annotated data. [This paper is forthcoming in Proceedings of NAACL 2013; Atlanta, GA, USA.]
| Year | Citations | |
|---|---|---|
Page 1
Page 1