Publication | Closed Access
A theory of term importance in automatic text analysis
388
Citations
12
References
1975
Year
EngineeringComputational AnalysisCorpus LinguisticsText MiningContent MiningApplied LinguisticsNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsDocument ClassificationCorpus AnalysisLanguage StudiesContent AnalysisDiscrimination Value AnalysisAutomatic IndexingKnowledge DiscoveryTerm ImportanceTerminology ExtractionWeb Text MiningInformation ExtractionKeyword ExtractionLinguistics
Statistical and probabilistic methods have been widely explored for automatic indexing and content analysis, yet many lack effectiveness and refined approaches are computationally unattractive. The authors introduce discrimination value analysis, which ranks words by the increase they cause in average document separation, selecting the words that maximize separation, and the method is computationally simple and applies to single words, phrases, and thesaurus categories. Experimental results demonstrate the effectiveness of the technique.
Abstract A good deal of work has been done over the years in an attempt to use statistical or probabilistic techniques as a basis for automatic indexing and content analysis. ( 1–10 ) Unfortunately, many of these methods are lacking in effectiveness, and the more refined procedures are computationally unattractive. A new technique, known as discrimination value analysis, ranks the text words in accordance with how well they are able to discriminate the documents of a collection from each other; that is, the value of a term depends on how much the average separation between individual documents changes when the given term is assigned for content identification. The best words are those which achieve the greatest separation. The discrimination value analysis is computationally simple, and it assigns a specific role in content analysis to single words, juxtaposed words and phrases, and word groups or thesaurus categories. Experimental results are given showing the effectiveness of the technique.
| Year | Citations | |
|---|---|---|
Page 1
Page 1