Publication | Closed Access
A formal study of information retrieval heuristics
350
Citations
18
References
2004
Year
Unknown Venue
EngineeringIntelligent Information RetrievalInformation Retrieval MethodsSemantic WebCorpus LinguisticsText MiningNatural Language ProcessingNecessary HeuristicsInformation RetrievalData ScienceData MiningRetrieval HeuristicsRelevance FeedbackIntelligent SearchingQuery ExpansionSearch TechnologyKnowledge DiscoveryComputer ScienceFormal StudyAutomated ReasoningInteractive Information Retrieval
Empirical studies of information retrieval methods show that good retrieval performance is closely related to the use of various retrieval heuristics, such as TF-IDF weighting. One basic research question is thus what exactly are these necessary heuristics that seem to cause good retrieval performance. In this paper, we present a formal study of retrieval heuristics. We formally define a set of basic desirable constraints that any reasonable retrieval function should satisfy, and check these constraints on a variety of representative retrieval functions. We find that none of these retrieval functions satisfies all the constraints unconditionally. Empirical results show that when a constraint is not satisfied, it often indicates non-optimality of the method, and when a constraint is satisfied only for a certain range of parameter values, its performance tends to be poor when the parameter is out of the range. In general, we find that the empirical performance of a retrieval formula is tightly related to how well it satisfies these constraints. Thus the proposed constraints provide a good explanation of many empirical observations and make it possible to evaluate any existing or new retrieval formula analytically.
| Year | Citations | |
|---|---|---|
Page 1
Page 1