Publication | Closed Access
Evaluating the utility of statistical phrases and latent semantic indexing for text classification
21
Citations
12
References
2003
Year
Unknown Venue
EngineeringIntelligent Information RetrievalLatent Semantic IndexingStatistical PhrasesTextual InformationCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsDocument ClassificationText ClassificationPhrase TermsLanguage StudiesContent AnalysisAutomatic ClassificationKnowledge DiscoveryTerminology ExtractionInformation ExtractionVector Space ModelKeyword ExtractionWindow PhrasesLinguistics
The term-based vector space model is a prominent technique for retrieving textual information. In this paper we examine the usefulness of phrases as terms in vector-based document classification. We focus on statistical techniques to extract both adjacent and window phrases from documents. We discover that the positive effect of adding phrase terms is very limited, if we have already achieved good performance using single-word terms, even when SVD/LSI is used as the dimensionality reduction method.
| Year | Citations | |
|---|---|---|
Page 1
Page 1