Publication | Closed Access
N-grams based features for Indonesian tweets classification problems
12
Citations
8
References
2017
Year
Unknown Venue
EngineeringCorpus LinguisticsText MiningNatural Language ProcessingSocial MediaInformation RetrievalData ScienceData MiningComputational LinguisticsDocument ClassificationLanguage StudiesContent AnalysisTwitter Active UsersMachine TranslationSocial Medium MiningNaive Bayes ClassifierAutomatic ClassificationKnowledge DiscoveryIntelligent ClassificationComputer ScienceN-grams Words DictionariesSocial Medium DataLinguistics
Twitter is one of popular microblogging services that allows users to write short messages up to 140 characters. Twitter active users in Indonesia have reached 29.4 million in 2017 and they have created an enormous number of tweets, a potential data source for supervised learning. In this work, six different set of n-grams words dictionaries were generated and they were used as references for creating numerical features of the tweets. We classified the tweets using k-Nearest Neighbors (k-NN) and Naive Bayes Classifier and compared the accuracy using F-measure. We also observed the classification times of each algorithm. The results show that k-NN algorithm performed better than Naive Bayes Classifier, i.e. 81.2% for F-measure using k=7. However, in terms of classification time, Naive Bayes Classifier is faster than k-NN for all k parameters.
| Year | Citations | |
|---|---|---|
Page 1
Page 1