Publication | Open Access
Cyber Hate Speech on Twitter: An Application of Machine Classification and Statistical Modeling for Policy and Decision Making
584
Citations
28
References
2015
Year
Abuse DetectionEngineeringComputational AnalysisPublic OpinionCommunicationSentiment AnalysisLanguage ProcessingVoted Ensemble Meta‐classifierText MiningNatural Language ProcessingSocial MediaData ScienceData ResourcesPolitical CommunicationDecision MakingCyber Hate SpeechContent AnalysisSocial Medium MiningHate SpeechKnowledge DiscoveryOnline HarassmentSocial Medium IntelligenceSocial ComputingSocial Medium DataArtsMachine Classification
The use of Big Data in policy and decision making is debated, and the 2013 murder of Drummer Lee Rigby sparked a public reaction on Twitter that offers a case to study the spread of online hate speech. The authors collected human‑annotated tweets immediately after Rigby’s murder and trained a supervised machine‑learning classifier that uses content‑based features, such as grammatical dependencies, to detect hateful or antagonistic language about race, ethnicity, or religion. The ensemble classifier combining probabilistic, rule‑based, and spatial methods achieved optimal performance, and its outputs were used in a statistical model that forecasts the spread of cyber hate on Twitter, informing policy and decision‑making.
The use of “Big Data” in policy and decision making is a current topic of debate. The 2013 murder of Drummer Lee Rigby in Woolwich, London, UK led to an extensive public reaction on social media, providing the opportunity to study the spread of online hate speech (cyber hate) on Twitter. Human annotated Twitter data was collected in the immediate aftermath of Rigby's murder to train and test a supervised machine learning text classifier that distinguishes between hateful and/or antagonistic responses with a focus on race, ethnicity, or religion; and more general responses. Classification features were derived from the content of each tweet, including grammatical dependencies between words to recognize “othering” phrases, incitement to respond with antagonistic action, and claims of well‐founded or justified discrimination against social groups. The results of the classifier were optimal using a combination of probabilistic, rule‐based, and spatial‐based classifiers with a voted ensemble meta‐classifier. We demonstrate how the results of the classifier can be robustly utilized in a statistical model used to forecast the likely spread of cyber hate in a sample of Twitter data. The applications to policy and decision making are discussed.
| Year | Citations | |
|---|---|---|
Page 1
Page 1