Table of Contents
Fetching ...

On Inherited Popularity Bias in Cold-Start Item Recommendation

Gregor Meehan, Johan Pauwels

TL;DR

This work studies how generative cold-start item recommenders inherit popularity bias from warm CF models, leading to overexposure of certain items when cold items lack interaction data. By analyzing Heater, GAR, and GoRec supervised by a pre-trained warm model (FREEDOM) across three multimedia datasets, the authors reveal that content-based predictions can map to high popularity for a subset of items, amplifying bias in cold-start predictions. They introduce a simple post-processing method that scales item embedding magnitudes using the rule $||\gamma_c x_c|| - \mu_w = \frac{||x_c|| - \mu_w}{1+\alpha}$ with $\gamma_c = \left(\frac{||x_c|| + \alpha \mu_w}{||x_c||(1+\alpha)}\right)$, thereby balancing exposure without severely harming user-level accuracy. Across datasets, this magnitude-based mitigation increases exposure diversity (higher $\text{Gini-Div}$) and improves low-end item MDG while maintaining overall performance, suggesting a practical route to fairer cold-start recommendations; code is released for replication.

Abstract

Collaborative filtering (CF) recommender systems struggle with making predictions on unseen, or 'cold', items. Systems designed to address this challenge are often trained with supervision from warm CF models in order to leverage collaborative and content information from the available interaction data. However, since they learn to replicate the behavior of CF methods, cold-start models may therefore also learn to imitate their predictive biases. In this paper, we show that cold-start systems can inherit popularity bias, a common cause of recommender system unfairness arising when CF models overfit to more popular items, thereby maximizing user-oriented accuracy but neglecting rarer items. We demonstrate that cold-start recommenders not only mirror the popularity biases of warm models, but are in fact affected more severely: because they cannot infer popularity from interaction data, they instead attempt to estimate it based solely on content features. This leads to significant over-prediction of certain cold items with similar content to popular warm items, even if their ground truth popularity is very low. Through experiments on three multimedia datasets, we analyze the impact of this behavior on three generative cold-start methods. We then describe a simple post-processing bias mitigation method that, by using embedding magnitude as a proxy for predicted popularity, can produce more balanced recommendations with limited harm to user-oriented cold-start accuracy.

On Inherited Popularity Bias in Cold-Start Item Recommendation

TL;DR

This work studies how generative cold-start item recommenders inherit popularity bias from warm CF models, leading to overexposure of certain items when cold items lack interaction data. By analyzing Heater, GAR, and GoRec supervised by a pre-trained warm model (FREEDOM) across three multimedia datasets, the authors reveal that content-based predictions can map to high popularity for a subset of items, amplifying bias in cold-start predictions. They introduce a simple post-processing method that scales item embedding magnitudes using the rule with , thereby balancing exposure without severely harming user-level accuracy. Across datasets, this magnitude-based mitigation increases exposure diversity (higher ) and improves low-end item MDG while maintaining overall performance, suggesting a practical route to fairer cold-start recommendations; code is released for replication.

Abstract

Collaborative filtering (CF) recommender systems struggle with making predictions on unseen, or 'cold', items. Systems designed to address this challenge are often trained with supervision from warm CF models in order to leverage collaborative and content information from the available interaction data. However, since they learn to replicate the behavior of CF methods, cold-start models may therefore also learn to imitate their predictive biases. In this paper, we show that cold-start systems can inherit popularity bias, a common cause of recommender system unfairness arising when CF models overfit to more popular items, thereby maximizing user-oriented accuracy but neglecting rarer items. We demonstrate that cold-start recommenders not only mirror the popularity biases of warm models, but are in fact affected more severely: because they cannot infer popularity from interaction data, they instead attempt to estimate it based solely on content features. This leads to significant over-prediction of certain cold items with similar content to popular warm items, even if their ground truth popularity is very low. Through experiments on three multimedia datasets, we analyze the impact of this behavior on three generative cold-start methods. We then describe a simple post-processing bias mitigation method that, by using embedding magnitude as a proxy for predicted popularity, can produce more balanced recommendations with limited harm to user-oriented cold-start accuracy.

Paper Structure

This paper contains 11 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Warm and cold item prediction counts with $k=20$ against the number of target users (i.e. the number of times an item appears in the validation or test set interactions) for the Electronics dataset. Each dot represents an item: we only include items with at least one holdout set interaction, so there are 44,083 in the warm plots and 12,601 in the cold plots.
  • Figure 2: Cold item prediction counts for the top 10% most predicted cold Electronics items. The x-axis is the maximum popularity value among the top 10 closest warm neighbors to each item by cosine similarity in the content features.
  • Figure 3: Cold prediction counts at $k=20$ against item vector magnitude in the Electronics dataset.
  • Figure 4: Cold test set item prediction counts at $k=20$ against item prediction count percentiles (i.e. each item's position in the sorted list of prediction counts) in the Electronics dataset. Only items predicted at least once are plotted.