Publication | Closed Access
Using LSI for text classification in the presence of background text
144
Citations
17
References
2001
Year
Unknown Venue
EngineeringLatent Semantic IndexingSemantic WebCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningPattern RecognitionText RecognitionComputational LinguisticsDocument ClassificationText ClassificationLanguage StudiesAutomatic ClassificationKnowledge DiscoveryTerminology ExtractionIntelligent ClassificationText IndexingBackground TextVector Space ModelText ProcessingLinguistics
This paper presents work that uses Latent Semantic Indexing (LSI) for text classification. However, in addition to relying on labeled training data, we improve classification accuracy by also using unlabeled data and other forms of available "background" text in the classification process. Rather than performing LSI's singular value decomposition (SVD) process solely on the training data, we instead use an expanded term-by-document matrix that includes both the labeled data as well as any available and relevant background text. We report the performance of this approach on data sets both with and without the inclusion of the background text, and compare our work to other efforts that can incorporate unlabeled data and other background text in the classification process.
| Year | Citations | |
|---|---|---|
Page 1
Page 1