Publication | Closed Access
Roman Urdu Multi-Class Offensive Text Detection using Hybrid Features and SVM
17
Citations
12
References
2020
Year
Unknown Venue
Abuse DetectionEngineeringFeature ExtractionMultimodal Sentiment AnalysisCorpus LinguisticsHate ContentText MiningNatural Language ProcessingData SciencePattern RecognitionText RecognitionComputational LinguisticsSvm ClassificationLanguage StudiesContent AnalysisSocial Medium MiningHybrid FeaturesAutomatic ClassificationSocial Medium DataText ProcessingLinguistics
Hate content has become a significant issue worldwide due to the increase in social networking sites. Detection of hate content from a language other than English is challenging. We propose a new technique that automatically detects the Roman Urdu comments from YouTube videos into five classes. These classes, including, Religious Hate, Violence Promotion, Extremist (Racist), Threat/Fear, and Neutral. We have generated dataset by scrapping Roman Urdu comments from YouTube videos and labeled by the language experts. We have considered N-grams and TF-IDF values for feature extraction followed by SVM classification. Some classes have relatively less instances, and we employed SMOTE for class-balancing. The developed model offers a high classification performance of 77.45% using the 10-Fold cross-validation technique. The proposed approach offers superior classification results as compared to others.
| Year | Citations | |
|---|---|---|
Page 1
Page 1