Detecting Offensive Language in Social Media to Protect Adolescent Online Safety

TLDR

Online social media text is highly unstructured and misspelled, making message‑level offensive language detection inaccurate, while user‑level detection is more feasible but under researched. The study proposes the Lexical Syntactic Feature (LSF) architecture to detect offensive content and identify potentially offensive users on social media. The LSF framework distinguishes pejoratives, profanities, and obscenities, uses hand‑crafted syntactic rules for name‑calling harassment, and incorporates users’ writing style, structure, and cyber‑bullying content as features to predict offensive potential. Experiments demonstrate that LSF outperforms existing methods, achieving 98.24 % precision and 94.34 % recall for sentence detection and 77.9 % precision and 77.8 % recall for user detection, with a processing speed of about 10 ms per sentence.

Abstract

Since the textual contents on online social media are highly unstructured, informal, and often misspelled, existing research on message-level offensive language detection cannot accurately detect offensive content. Meanwhile, user-level offensiveness detection seems a more feasible approach but it is an under researched area. To bridge this gap, we propose the Lexical Syntactic Feature (LSF) architecture to detect offensive content and identify potential offensive users in social media. We distinguish the contribution of pejoratives/profanities and obscenities in determining offensive content, and introduce hand-authoring syntactic rules in identifying name-calling harassments. In particular, we incorporate a user's writing style, structure and specific cyber bullying content as features to predict the user's potentiality to send out offensive content. Results from experiments showed that our LSF framework performed significantly better than existing methods in offensive content detection. It achieves precision of 98.24% and recall of 94.34% in sentence offensive detection, as well as precision of 77.9% and recall of 77.8% in user offensive detection. Meanwhile, the processing speed of LSF is approximately 10msec per sentence, suggesting the potential for effective deployment in social media.

References

Page 1

	Year	Citations

Page 1