Relevance weighting using distance between term occurrences

Abstract

Recent work has achieved promising retrieval performance using distance between term occurrences as a primary estimator of document relevance. A major benefit of this approach is that relevance scoring does not rely on collection frequency statistics. A theoretical framework for lexical spans is now proposed which encompasses these approaches and suggests a number of important directions for future experimental work. Based on the formalism, approaches to issues such as scoring partial spans, treatment of repeated term occurrences within spans, and the importance of ordering are proposed. Consideration is given to the practical application of the formalism to both locating and scoring concept intersections and to locating phrases (with an estimate of confidence) despite intervening or substituted words. 1 Introduction The idea that the relative positions of query terms within a document may supply information about relevance arose nearly forty years ago. As early as 1958, Luhn [6] wrot...

References

Page 1

	Year	Citations

Page 1