Publication | Closed Access
Detecting Spam Tweets using Character N-gram Features
24
Citations
21
References
2018
Year
Unknown Venue
Abuse DetectionEngineeringSocial Medium MonitoringCorpus LinguisticsText MiningNatural Language ProcessingSpam FilteringSocial MediaData ScienceData MiningComputational LinguisticsLanguage StudiesSocial Medium MiningMachine TranslationKnowledge DiscoveryTwitter PopularityLow LatencyComputer ScienceSpam TweetsSocial Medium DataLinguistics
Twitter popularity made it an important and instantaneous source of news and trending events around the world. It has attracted the attention of spammers who post malicious content embedded in tweets and in their profile pages. Spammers use different and evolving techniques to evade traditional security mechanisms, and that creates the need to develop robust solutions that adapt with these techniques. In this paper, we propose using a low-level character n-grams feature that avoids the use of tokenizers or any language dependent tools. Using a publicly available dataset, we evaluate the performance of multiple ma-chine learning classifiers with different representations of the proposed feature. Our experiments show that our approach is an enhancement over the approaches that use word n-grams from tweet tokens. We also show that our technique can detect spam tweets with low latency which is crucial in a real-time environment like twitter.
| Year | Citations | |
|---|---|---|
Page 1
Page 1