Publication | Closed Access
The limitations of term co-occurrence data for query expansion in document retrieval systems
260
Citations
16
References
1991
Year
Term Co-occurrence DataEngineeringIntelligent Information RetrievalQuery ModelSemantic WebCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningDocument Retrieval SystemsComputational LinguisticsQuery ExpansionLanguage StudiesSearch TechnologyKnowledge DiscoveryText IndexingKeyword SearchCooccurrence DataQuery AnalysisTerm Cooccurrence DataLinguisticsInteractive Information Retrieval
Term cooccurrence data has been extensively used in document retrieval systems for the identification of indexing terms that are similar to those that have been specified in a user query: these similar terms can then be used to augment the original query statement. Despite the plausibility of this approach to query expansion, the retrieval effectiveness of the expanded queries is often no greater than, or even less than, the effectiveness of the unexpanded queries. This article demonstrates that the similar terms identified by cooccurrence data in a query expansion system tend to occur very frequently in the database that is being searched. Unfortunately, frequent terms tend to discriminate poorly between relevant and nonrelevant documents, and the general effect of query expansion is thus to add terms that do little or nothing to improve the discriminatory power of the original query. © 1991 John Wiley & Sons, Inc.
| Year | Citations | |
|---|---|---|
Page 1
Page 1