Publication | Closed Access
An improved TF-IDF weights function based on information theory
24
Citations
5
References
2010
Year
Unknown Venue
EngineeringTf-idf WeightsCorpus LinguisticsText MiningNatural Language ProcessingClassification MethodInformation RetrievalData ScienceData MiningPattern RecognitionComputational LinguisticsDocument ClassificationText ClassificationLanguage StudiesInformation TheoryAutomatic ClassificationKnowledge DiscoveryIntelligent ClassificationComputer ScienceVector Space ModelLinguistics
Vector Space Model (VSM) is a typical method to describe the text feature in text classification at present. It adopts TF-IDF weights to compute the term weighting in each dimension of the text feature. However, it only considers the relationship between the term and the whole text but neglects the relationship between different terms. Aiming at this problem an improved TF-IDF weights function is proposed which uses the distribution information among classes and inside a class. The experience shows that the improved method is feasible and effective. In addition, it greatly improves the accuracy of text category.
| Year | Citations | |
|---|---|---|
Page 1
Page 1