Hate Speech Identification using the Hate Codes for Indonesian Tweets

Abstract

The hate speech has become the major source of negativity spread in all over the social media. As the social media becomes aware of this issue, they gradually build several new regulations to handle the spread of hate speech e.g. by automatically blocking or suspending the accounts or posts containing hate speech. However, the social media users have become more creative in expressing the hate speech. To avoid the social media regulations regarding the hate speech, users usually use some special codes to interact with each other. This study aims to utilize the hate codes to identify the hate speech on the social media data. We used the Indonesian tweets as the dataset. We utilized Logistic Regression, Support Vector Machine, Naïve Bayes, and Random Forest Decision Tree as the classifiers. The highest F-Measure score for the hate speech identification was 80.71% by using the hate code feature combined with Logistic Regression as the classifier.

References

Page 1

	Year	Citations

Page 1