Publication | Closed Access
Integrating Collocation as TF-IDF Enhancement to Improve Classification Accuracy
30
Citations
17
References
2019
Year
Unknown Venue
EngineeringInverse Document FrequencyPsycholinguisticsCorpus LinguisticsText MiningNatural Language ProcessingApplied LinguisticsClassification MethodInformation RetrievalData ScienceComputational LinguisticsLanguage TestingDocument ClassificationLanguage StudiesLexiconAutomatic ClassificationComputational LexicologyKnowledge DiscoveryTerminology ExtractionTf-idf EnhancementKeyword ExtractionTerm FrequencySingle TermLinguistics
The motivation of the study is to address the weakness of Term Frequency - Inverse Document Frequency (TF-IDF) in dealing with single terms because single terms can sometimes be vague. That is, a single term when used for indexing, could convey several interpretations. A single term can also be too general, in which, it doesn't have a discriminating power to differentiate terms such as from two individual terms such as "junior" and "college." It is not enough to distinguish "junior college" from "college junior". Thus, this study aims to enhance TF-IDF by integrating collocation as a term feature. The collocated terms are extracted based on the determination of part-of-speech (POS) that forms specific patterns such as adjective + noun, noun + noun, noun + verb, etc. There are three (3) document classifiers which had been considered in this study. These classifiers will be subjected to traditional and modified TF-IDF are RandomForest, MultinomialNB (MultiNB), and SVM. The result of this experiment shows that integrating collocation as part of the enhancement of the TF-IDF process outperforms the traditional TF-IDF by an increase of up to 10 percent.
| Year | Citations | |
|---|---|---|
Page 1
Page 1