Roman Urdu Multi-Class Offensive Text Detection using Hybrid Features and SVM

Abstract

Hate content has become a significant issue worldwide due to the increase in social networking sites. Detection of hate content from a language other than English is challenging. We propose a new technique that automatically detects the Roman Urdu comments from YouTube videos into five classes. These classes, including, Religious Hate, Violence Promotion, Extremist (Racist), Threat/Fear, and Neutral. We have generated dataset by scrapping Roman Urdu comments from YouTube videos and labeled by the language experts. We have considered N-grams and TF-IDF values for feature extraction followed by SVM classification. Some classes have relatively less instances, and we employed SMOTE for class-balancing. The developed model offers a high classification performance of 77.45% using the 10-Fold cross-validation technique. The proposed approach offers superior classification results as compared to others.

References

Page 1

	Year	Citations

Page 1