Table of Contents
Fetching ...

Rethinking Recommender Systems: Cluster-based Algorithm Selection

Andreas Lizenberger, Ferdinand Pfeifer, Bastian Polewka

TL;DR

The paper tackles the problem of improving recommender-system performance by tailoring algorithms to user clusters. It introduces a Cluster-based AutoRecSys pipeline that combines four clustering methods (two k-means variants and two graph-based methods) with eight recommendation algorithms, evaluated across eight datasets, and uses per-cluster best algorithms to form a weighted, combined recommender. Empirical results show significant gains in $nDCG@10$ on five of eight datasets, with average improvements up to $66.47\%$ across datasets; however, gains are dataset-dependent and no single clustering method dominates all cases. The work demonstrates that clustering information can effectively guide algorithm selection and hyperparameter optimization, reducing runtime and enabling substantial performance improvements in practical settings. The authors advocate for broader exploration of clustering approaches and meta-information to further enhance cluster-based algorithm selection in AutoRecSys.

Abstract

Cluster-based algorithm selection deals with selecting recommendation algorithms on clusters of users to obtain performance gains. No studies have been attempted for many combinations of clustering approaches and recommendation algorithms. We want to show that clustering users prior to algorithm selection increases the performance of recommendation algorithms. Our study covers eight datasets, four clustering approaches, and eight recommendation algorithms. We select the best performing recommendation algorithm for each cluster. Our work shows that cluster-based algorithm selection is an effective technique for optimizing recommendation algorithm performance. For five out of eight datasets, we report an increase in nDCG@10 between 19.28% (0.032) and 360.38% (0.191) compared to algorithm selection without prior clustering.

Rethinking Recommender Systems: Cluster-based Algorithm Selection

TL;DR

The paper tackles the problem of improving recommender-system performance by tailoring algorithms to user clusters. It introduces a Cluster-based AutoRecSys pipeline that combines four clustering methods (two k-means variants and two graph-based methods) with eight recommendation algorithms, evaluated across eight datasets, and uses per-cluster best algorithms to form a weighted, combined recommender. Empirical results show significant gains in on five of eight datasets, with average improvements up to across datasets; however, gains are dataset-dependent and no single clustering method dominates all cases. The work demonstrates that clustering information can effectively guide algorithm selection and hyperparameter optimization, reducing runtime and enabling substantial performance improvements in practical settings. The authors advocate for broader exploration of clustering approaches and meta-information to further enhance cluster-based algorithm selection in AutoRecSys.

Abstract

Cluster-based algorithm selection deals with selecting recommendation algorithms on clusters of users to obtain performance gains. No studies have been attempted for many combinations of clustering approaches and recommendation algorithms. We want to show that clustering users prior to algorithm selection increases the performance of recommendation algorithms. Our study covers eight datasets, four clustering approaches, and eight recommendation algorithms. We select the best performing recommendation algorithm for each cluster. Our work shows that cluster-based algorithm selection is an effective technique for optimizing recommendation algorithm performance. For five out of eight datasets, we report an increase in nDCG@10 between 19.28% (0.032) and 360.38% (0.191) compared to algorithm selection without prior clustering.
Paper Structure (20 sections, 8 figures, 2 tables)

This paper contains 20 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Cluster-based AutoRecSys process.
  • Figure 2: Normalized Histogram: number of interactions to number of users.
  • Figure 3: $k$-Means number of interactions with nDCG@10.
  • Figure 4: $k$-Means item-interaction vector with nDCG@10.
  • Figure 5: Louvain with nDCG@10.
  • ...and 3 more figures