Publication | Closed Access
Classifying latent user attributes in twitter
645
Citations
10
References
2010
Year
Unknown Venue
Twitter User LanguageEngineeringSocial Medium MonitoringCommunicationCorpus LinguisticsText MiningNatural Language ProcessingComputational Social ScienceSocial MediaData ScienceSocial Media OutletsLanguage StudiesContent AnalysisSocial Medium MiningLatent User AttributesKnowledge DiscoverySocial ComputingSocial Medium DataLinguisticsInformal Content
Social media outlets such as Twitter have become an important forum for peer interaction. Thus the ability to classify latent user attributes, including gender, age, regional origin, and political orientation solely from Twitter user language or similar highly informal content has important applications in advertising, personalization, and recommendation. This paper includes a novel investigation of stacked-SVM-based classification algorithms over a rich set of original features, applied to classifying these four user attributes. It also includes extensive analysis of features and approaches that are effective and not effective in classifying user attributes in Twitter-style informal written genres as distinct from the other primarily spoken genres previously studied in the user-property classification literature. Our models, singly and in ensemble, significantly outperform baseline models in all cases. A detailed analysis of model components and features provides an often entertaining insight into distinctive language-usage variation across gender, age, regional origin and political orientation in modern informal communication.
| Year | Citations | |
|---|---|---|
Page 1
Page 1