Publication | Closed Access
Detecting offensive tweets via topical feature discovery over a large scale twitter corpus
268
Citations
15
References
2012
Year
Unknown Venue
Abuse DetectionEngineeringMachine LearningSocial Medium MonitoringCorpus LinguisticsText MiningNatural Language ProcessingComputational Social ScienceSocial MediaData ScienceComputational LinguisticsDetects Offensive TweetsLanguage EngineeringLanguage StudiesContent AnalysisSocial Medium MiningHate SpeechNlp TaskKnowledge DiscoveryTopical Feature DiscoveryNovel Semi-supervised ApproachOffensive TweetsSocial Medium DataProfane LanguageLinguistics
In this paper, we propose a novel semi-supervised approach for detecting profanity-related offensive content in Twitter. Our approach exploits linguistic regularities in profane language via statistical topic modeling on a huge Twitter corpus, and detects offensive tweets using automatically these generated features. Our approach performs competitively with a variety of machine learning (ML) algorithms. For instance, our approach achieves a true positive rate (TP) of 75.1% over 4029 testing tweets using Logistic Regression, significantly outperforming the popular keyword matching baseline, which has a TP of 69.7%, while keeping the false positive rate (FP) at the same level as the baseline at about 3.77%. Our approach provides an alternative to large scale hand annotation efforts required by fully supervised learning approaches.
| Year | Citations | |
|---|---|---|
Page 1
Page 1