Concepedia

TLDR

Threat and abusive language spread rapidly on social media, necessitating robust automatic detection systems; prior Bengali studies used MNB and SVM with Unicode characters. The study proposes an automatic threat and abusive language detection system using machine learning and NLP techniques. The system incorporates Unicode emoticons and Bengali characters, and employs MNB, SVM, and a CNN–LSTM architecture. SVM with a linear kernel achieved the highest accuracy of 78 % among the evaluated algorithms.

Abstract

Threat and abusive languages spread quickly through social media which can be controlled if we can detect and remove them. Since there exist many social media like Facebook, Twitter, Instagram etc and a huge number of social media users, we need a robust and effective automatic system to identify threat and abusive languages. In our proposed system Machine Learning and Natural Language Processing techniques have been implemented to build an automatic system. Previous research on Bengali abusive language detection used Multinomial Näıve Bayes (MNB), Support Vector Machine(SVM) algorithms and considered Bengali Unicode characters to build their system. We considered both Unicode emoticons and Unicode Bengali characters as valid input in our proposed system. Besides MNB and SVM algorithm, we implemented Convolutional Neural Network (CNN) with Long Short Term Memory(LSTM). Among three algorithms, SVM with linear kernel performed best with 78% accuracy.

References

YearCitations

Page 1