Abusive Language Detection in Online User Content

TLDR

Detection of abusive language in user‑generated online content has become increasingly important, yet current commercial methods that rely on blacklists and regular expressions fail to capture more subtle hate speech. The study develops a machine‑learning method to detect hate speech in online user comments from two domains, outperforming a state‑of‑the‑art deep‑learning approach. The authors created the first annotated corpus of user comments for abusive language and applied their detection tool to analyze abusive language trends over time and across settings. Analysis with the tool revealed patterns of abusive language over time and across settings, deepening understanding of this behavior.

Abstract

Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-of-the-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior.

References

Page 1

	Year	Citations

Page 1