Publication | Closed Access
An Improved Approach to Weighting Terms in Text
33
Citations
0
References
2000
Year
Unknown Venue
EngineeringIntelligent Information RetrievalPart-of-speech TaggingText RepresentationCorpus LinguisticsText MiningNatural Language ProcessingText RetrievalInformation RetrievalData ScienceData MiningComputational LinguisticsDocument ClassificationLanguage StudiesContent AnalysisMachine TranslationKnowledge DiscoveryTerminology ExtractionVector Space ModelImproved ApproachKeyword ExtractionTerm FrequencyLinguistics
Text Representation has been the fundamental problem in Information Retrieval,such as text retrieval,automatic summary and search engine.tf.idf(term frequency,inverse document frequency)as one of term weighting schemes in Vector Space Model is a good text representation which is popular and make good results in the field of Information Retrieval.The proportion of distribution of terms in text collection is one of the most important factors of expressing the content of text, but it is beyond tf.idf's power.Because of this,this paper provides an improved approach named tf.idf.IG to remedy this defect by Information Gain from Information Theory.The Information Gain of terms as one factor for term weighting schemes can effectively weight the proportion of distribution of terms.In text classification,tf.idf.IG in this paper overcomes old tf.idf.