Publication | Closed Access
Twitter Dataset for Hate Speech and Cyberbullying Detection in Indonesian Language
49
Citations
7
References
2019
Year
Unknown Venue
Abuse DetectionEngineeringMultimodal Sentiment AnalysisSentiment AnalysisCorpus LinguisticsJournalismText MiningNatural Language ProcessingSocial MediaData ScienceComputational LinguisticsTwitter DatasetContent AnalysisIndonesian LanguageSocial Medium MiningPolarity ScoreHate SpeechSocial Media PlatformsCyberbullyingOnline HarassmentSocial Medium DataArtsLinguisticsTwitter Api
During the 2019 election period in Indonesia, many hate speech and cyberbullying cases have occurred in social media platforms including Twitter. The government tries to filter every negative content to be spread out during this period. However, to detect hate speech is not an easy task. This paper presents the process of developing a dataset that can be used to build a hate speech detection model. More than 1 million tweets have been successfully collected from using Twitter API. The basic preprocessing and preliminary study using machine learning was implemented. Latent Dirichlet Allocation (LDA) algorithm was used to extract the topic for each tweet to see whether these topics can be associated with debate themes. Pretrained sentiment analysis was also applied to the dataset to generate a polarity score for each tweet. From 83,752 tweets included in the analysis step, the number of positive and negative tweets are almost the same.
| Year | Citations | |
|---|---|---|
Page 1
Page 1