Evaluating Collaborative Filtering Recommender Algorithms: A Survey

TLDR

The rapid growth of online information has made efficient access essential, leading to the development of recommender systems that use user–item interactions and contextual data to generate personalized item lists. This paper reviews evaluation metrics for recommendation algorithms. The authors survey classical and modern recommendation algorithms, comparing their performance across five benchmark datasets using various evaluation metrics. Experiments reveal no single algorithm dominates across all metrics, with substantial variability across datasets, underscoring the need to align evaluation criteria with application goals.

Abstract

Due to the explosion of available information on the Internet, the need for effective means of accessing and processing them has become vital for everyone. Recommender systems have been developed to help users to find what they may be interested in and business owners to sell their products more efficiently. They have found much attention in both academia and industry. A recommender algorithm takes into account user–item interactions, i.e., rating (or purchase) history of users on items, and their contextual information, if available. It then provides a list of potential items for each target user, such that the user is likely to positively rate (or purchase) them. In this paper, we review evaluation metrics used to assess performance of recommendation algorithms. We also survey a number of classical and modern recommendation algorithms and compare their performance in terms of different evaluation metrics on five benchmark datasets. Our experiments show that there is no golden recommendation algorithm showing the best performance in all evaluation metrics. We also find large variability across the datasets. This indicates that one should carefully consider the evaluation criteria in choosing a recommendation algorithm for a particular application.

References

Page 1

	Year	Citations

Page 1