Toxic Comment Classification Using S-BERT Vectorization and Random Forest Algorithm

Abstract

The growing popularity of social media platforms and microblogging websites has led to an increase in the expression of views and opinions. However, conversations and debates on these platforms often lead to the use of toxic comments, which consists of insulting and hateful remarks. To address this issue, it is important for social media systems to be able to recognize harmful comments. With the rising incidence of cyberbullying, it is crucial to study the classification of toxic comments using various algorithms. This study compares the effectiveness of different word and sentence embedding methods, including TF-IDF, InferSent, Bert, and T5 for toxic comments classification. A comparative study is also conducted on the impact of using SMOTE to balance the highly imbalanced dataset. The results of these models are compared and analysed. It is observed that T5 embedding with Random Forest Classifier works best at 0.91 F1-Score.

References

Page 1

	Year	Citations

Page 1