Table of Contents
Fetching ...

Investigating Popularity Bias Amplification in Recommender Systems Employed in the Entertainment Domain

Dominik Kowald

TL;DR

The paper analyzes popularity bias amplification in entertainment recommender systems and its implications for fairness under trustworthy AI. It uses Last.fm, MovieLens, and MyAnimeList to quantify amplification with metrics including MAE differences, miscalibration via $KL(p||q)$, and popularity lift via $PL(g)$ across LowPop/MedPop/HighPop groups, evaluating UserKNN and NMF. Key findings show that LowPop users experience lower accuracy and higher miscalibration and popularity lift, with music domain showing caveats due to repeat consumption, motivating a weighted PL variant. The work informs regulator-relevant discussions on fairness and calibration in recommender systems and outlines concrete avenues for metric refinement, mitigation strategies, and online evaluations.

Abstract

Recommender systems have become an integral part of our daily online experience by analyzing past user behavior to suggest relevant content in entertainment domains such as music, movies, and books. Today, they are among the most widely used applications of AI and machine learning. Consequently, regulations and guidelines for trustworthy AI, such as the European AI Act, which addresses issues like bias and fairness, are highly relevant to the design, development, and evaluation of recommender systems. One particularly important type of bias in this context is popularity bias, which results in the unfair underrepresentation of less popular content in recommendation lists. This work summarizes our research on investigating the amplification of popularity bias in recommender systems within the entertainment sector. Analyzing datasets from three entertainment domains, music, movies, and anime, we demonstrate that an item's recommendation frequency is positively correlated with its popularity. As a result, user groups with little interest in popular content receive less accurate recommendations compared to those who prefer widely popular items. Furthermore, this work contributes to a better understanding of the connection between recommendation accuracy, calibration quality of algorithms, and popularity bias amplification.

Investigating Popularity Bias Amplification in Recommender Systems Employed in the Entertainment Domain

TL;DR

The paper analyzes popularity bias amplification in entertainment recommender systems and its implications for fairness under trustworthy AI. It uses Last.fm, MovieLens, and MyAnimeList to quantify amplification with metrics including MAE differences, miscalibration via , and popularity lift via across LowPop/MedPop/HighPop groups, evaluating UserKNN and NMF. Key findings show that LowPop users experience lower accuracy and higher miscalibration and popularity lift, with music domain showing caveats due to repeat consumption, motivating a weighted PL variant. The work informs regulator-relevant discussions on fairness and calibration in recommender systems and outlines concrete avenues for metric refinement, mitigation strategies, and online evaluations.

Abstract

Recommender systems have become an integral part of our daily online experience by analyzing past user behavior to suggest relevant content in entertainment domains such as music, movies, and books. Today, they are among the most widely used applications of AI and machine learning. Consequently, regulations and guidelines for trustworthy AI, such as the European AI Act, which addresses issues like bias and fairness, are highly relevant to the design, development, and evaluation of recommender systems. One particularly important type of bias in this context is popularity bias, which results in the unfair underrepresentation of less popular content in recommendation lists. This work summarizes our research on investigating the amplification of popularity bias in recommender systems within the entertainment sector. Analyzing datasets from three entertainment domains, music, movies, and anime, we demonstrate that an item's recommendation frequency is positively correlated with its popularity. As a result, user groups with little interest in popular content receive less accurate recommendations compared to those who prefer widely popular items. Furthermore, this work contributes to a better understanding of the connection between recommendation accuracy, calibration quality of algorithms, and popularity bias amplification.

Paper Structure

This paper contains 8 sections, 2 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Correlation of music artist popularity and recommendation frequency in the Last.fm dataset. Both algorithms investigated tend to favor popular music artists kowald2020unfairnesskowald2021support. Similar results can be obtained for the movie and anime domains kowald2022popularityecir_bias_2023