Stemming algorithms: A case study for detailed evaluation

TLDR

Information retrieval experiments are typically evaluated using average precision and recall, and decisions about technique superiority rely solely on these metrics. The study argues that average performance metrics must be statistically validated and that detailed query‑level analysis can reveal additional insights. The authors conduct a case study of stemming algorithms, presenting novel evaluation methods and demonstrating their effectiveness. © 1996 John Wiley & Sons, Inc.

Abstract

The majority of information retrieval experiments are evaluated by measures such as average precision and average recall. Fundamental decisions about the superiority of one retrieval technique over another are made solely on the basis of these measures. We claim that average performance figures need to be validated with a careful statistical analysis and that there is a great deal of additional information that can be uncovered by looking closely at the results of individual queries. This article is a case study of stemming algorithms which describes a number of novel approaches to evaluation and demonstrates their value. © 1996 John Wiley & Sons, Inc.

References

Page 1

	Year	Citations

Page 1