Table of Contents
Fetching ...

Revisiting BPR: A Replicability Study of a Common Recommender System Baseline

Aleksandr Milogradskii, Oleg Lashinin, Alexander P, Marina Ananyeva, Sergey Kolesnikov

TL;DR

It is demonstrated that with proper tuning of its hyperparameters, the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.

Abstract

Bayesian Personalized Ranking (BPR), a collaborative filtering approach based on matrix factorization, frequently serves as a benchmark for recommender systems research. However, numerous studies often overlook the nuances of BPR implementation, claiming that it performs worse than newly proposed methods across various tasks. In this paper, we thoroughly examine the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations. Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations. Furthermore, through extensive experiments on real-world datasets under modern evaluation settings, we demonstrate that with proper tuning of its hyperparameters, the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets. Specifically, on the Million Song Dataset, the BPR model with hyperparameters tuning statistically significantly outperforms Mult-VAE by 10% in NDCG@100 with binary relevance function.

Revisiting BPR: A Replicability Study of a Common Recommender System Baseline

TL;DR

It is demonstrated that with proper tuning of its hyperparameters, the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.

Abstract

Bayesian Personalized Ranking (BPR), a collaborative filtering approach based on matrix factorization, frequently serves as a benchmark for recommender systems research. However, numerous studies often overlook the nuances of BPR implementation, claiming that it performs worse than newly proposed methods across various tasks. In this paper, we thoroughly examine the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations. Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations. Furthermore, through extensive experiments on real-world datasets under modern evaluation settings, we demonstrate that with proper tuning of its hyperparameters, the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets. Specifically, on the Million Song Dataset, the BPR model with hyperparameters tuning statistically significantly outperforms Mult-VAE by 10% in NDCG@100 with binary relevance function.
Paper Structure (22 sections, 5 equations, 4 figures, 6 tables)

This paper contains 22 sections, 5 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Number of citations of BPR per year according to Google Scholar, as of May 15, 2024.
  • Figure 2: Area Under the ROC Curve (AUC) prediction quality for the Netflix dataset using various open-source BPR implementations. The ItemPop model is included to compare performance against a non-personalized baseline.
  • Figure 3: Performance in NDCG@100 relative to the number of embedding dimensions on the two datasets.
  • Figure 4: Mean absolute value of the first momentum in the Adam Optimizer with $\beta_1 = 0.2/0.9$ and uniform/adaptive negative sampling for two datasets over the first 300000 training iterations. The values are averaged over every 1000 iterations. Each drop on the graph indicates the beginning of an epoch.