Publication | Closed Access
Improved information gain-based feature selection for text categorization
42
Citations
7
References
2014
Year
Unknown Venue
Natural Language ProcessingEngineeringInformation RetrievalData ScienceData MiningText CategorizationTerm Frequency InformationPredictive AnalyticsAutomatic ClassificationKnowledge DiscoveryFeature SelectionKeyword ExtractionDocument ClassificationIntelligent ClassificationTerm FrequencyContent AnalysisCorpus LinguisticsText Mining
Feature Selection (FS) is one of the most important issues in Text Categorization (TC). Empirical studies show that Information Gain (IG) is an effective method in FS. However, as traditional IG gives little attention to term frequency and takes into account the situation that the term does not appear, the effect is not ideal. In this paper, we put forward an improved information gain-based feature selection method using term frequency information and balance factor(IGTB) for statistical machine learning-based text categorization. Our feature selection method strives to precisely pick out the key feature items on the text corpus. Experiments on Reuters-21578 and WebKB collections show that our method efficiently enhances the categorization accuracy compared with the conventional information gain and other methods.
| Year | Citations | |
|---|---|---|
Page 1
Page 1