Concepedia

TLDR

Search engine click logs are a valuable source of relevance information, but they are biased by presentation order, as click probability depends on a document’s position in the results page. This study aims to explain position bias by modeling how click probability depends on document position. The authors propose four hypotheses about position bias, collect extensive click data by perturbing a major search engine’s ranking, and evaluate which hypothesis best fits the data compared to a simple logistic regression model. The results show that simple position models fail, while a cascade model—where users scan results top‑to‑bottom and stop when finding a satisfactory document—best explains early‑rank position bias.

Abstract

Search engine click logs provide an invaluable source of relevance information, but this information is biased. A key source of bias is presentation order: the probability of click is influenced by a document's position in the results page. This paper focuses on explaining that bias, modelling how probability of click depends on position. We propose four simple hypotheses about how position bias might arise. We carry out a large data-gathering effort, where we perturb the ranking of a major search engine, to see how clicks are affected. We then explore which of the four hypotheses best explains the real-world position effects, and compare these to a simple logistic regression model. The data are not well explained by simple position models, where some users click indiscriminately on rank 1 or there is a simple decay of attention over ranks. A 'cascade' model, where users view results from top to bottom and leave as soon as they see a worthwhile document, is our best explanation for position bias in early ranks

References

YearCitations

Page 1