ClustKNN: A Highly Scalable Hybrid Model- & Memory-Based CF Algorithm

TLDR

Collaborative filtering recommender systems are essential for locating items of interest and can boost revenue, yet the vast scale of e‑commerce data demands algorithms that balance recommendation quality with computational efficiency. This work introduces ClustKnn, a straightforward and intuitive algorithm designed for large‑scale datasets. ClustKnn first compresses data by building a simple clustering model, then rapidly generates recommendations using a nearest‑neighbor approach. Analytical and empirical results show that ClustKnn is highly scalable, intuitive, and achieves recommendation accuracy comparable to or better than popular CF algorithms.

Abstract

Collaborative Filtering (CF)-based recommender systems are indispensable tools to find items of interest from the unmanageable number of available items. Moreover, companies who deploy a CF-based recommender system may be able to increase revenue by drawing customers’ attention to items that they are likely to buy. However, the sheer number of customers and items typical in e-commerce systems demand specially designed CF algorithms that can gracefully cope with the vast size of the data. Many algorithms proposed thus far, where the principal concern is recommendation quality, may be too expensive to operate in a large-scale system. We propose ClustKnn, a simple and intuitive algorithm that is well suited for large data sets. The method first compresses data tremendously by building a straightforward but ecient clustering model. Recommendations are then generated quickly by using a simple Nearest Neighbor-based approach. We demonstrate the feasibility of ClustKnn both analytically and empirically. We also show, by comparing with a number of other popular CF algorithms that, apart from being highly scalable and intuitive, ClustKnn provides very good recommendation accuracy as well.

References

Page 1

	Year	Citations

Page 1