Concepedia

Publication | Closed Access

Evaluating the utility of statistical phrases and latent semantic indexing for text classification

21

Citations

12

References

2003

Year

Abstract

The term-based vector space model is a prominent technique for retrieving textual information. In this paper we examine the usefulness of phrases as terms in vector-based document classification. We focus on statistical techniques to extract both adjacent and window phrases from documents. We discover that the positive effect of adding phrase terms is very limited, if we have already achieved good performance using single-word terms, even when SVD/LSI is used as the dimensionality reduction method.

References

YearCitations

Page 1