Concepedia

Abstract

Cristian Moral, Angelica de Antonio, Ricardo Imbert and JaimeRamirezEscuela Tecnica Superior de Ingenieros Informaticos, UniversidadPolitecnica de Madrid, SpainAbstractBackground. During the last fifty years, improved information retrieval techniques havebecome necessary because of the huge amount of information people have available, whichcontinues to increase rapidly due to the use of new technologies and the Internet.Stemming is one of the processes that can improve information retrieval in terms ofaccuracy and performance.Aim. This paper provides a detailed assessment of the current status of the stemmingprocess framed in an information retrieval application field by tracing its historical evolution.Method. Papers presenting the first approaches for stemming were reviewed to extracttheir main features, benefits and drawbacks. Additionally, papers dealing with stemmersfor non-English languages or with some more recent proposals were also consulted andcompiled. Finally, experimental papers defining the most well-known methods and metricsaimed at evaluating and classifying stemmers were also taken into account to expose theircontributions and results. Results. Even if not all researchers agree on the benefits and drawbacks of usingstemming in an information retrieval process in general terms, many of them agree on itsbenefits in specific contexts, such as when the language is highly inflective, when documentsare short or when there is limited space for storing data. Some researchers also state thatthe nature of the documents can influence the performance and the accuracy of thestemmer. Conclusions. Despite many researchers having investigated this field over many years,there are still some open questions, such as how to evaluate a stemmer independently ofthe information retrieval process, or how much a stemmer improves an information retrievalapplication in terms of speed. As a summary, some guidelines are also provided to help

References

YearCitations

Page 1