Table of Contents
Fetching ...

Aligning Recommendations with User Popularity Preferences

Mona Schirmer, Anton Thielmann, Pola Schwöbel, Thomas Martynec, Giuseppe Di Benedetto, Ben London, Yannik Stein

Abstract

Popularity bias is a pervasive problem in recommender systems, where recommendations disproportionately favor popular items. This not only results in "rich-get-richer" dynamics and a homogenization of visible content, but can also lead to misalignment of recommendations with individual users' preferences for popular or niche content. This work studies popularity bias through the lens of user-recommender alignment. To this end, we introduce Popularity Quantile Calibration, a measurement framework that quantifies misalignment between a user's historical popularity preference and the popularity of their recommendations. Building on this notion of popularity alignment, we propose SPREE, an inference-time mitigation method for sequential recommenders based on activation steering. SPREE identifies a popularity direction in representation space and adaptively steers model activations based on an estimate of each user's personal popularity bias, allowing both the direction and magnitude of steering to vary across users. Unlike global debiasing approaches, SPREE explicitly targets alignment rather than uniformly reducing popularity. Experiments across multiple datasets show that SPREE consistently improves user-level popularity alignment while preserving recommendation quality.

Aligning Recommendations with User Popularity Preferences

Abstract

Popularity bias is a pervasive problem in recommender systems, where recommendations disproportionately favor popular items. This not only results in "rich-get-richer" dynamics and a homogenization of visible content, but can also lead to misalignment of recommendations with individual users' preferences for popular or niche content. This work studies popularity bias through the lens of user-recommender alignment. To this end, we introduce Popularity Quantile Calibration, a measurement framework that quantifies misalignment between a user's historical popularity preference and the popularity of their recommendations. Building on this notion of popularity alignment, we propose SPREE, an inference-time mitigation method for sequential recommenders based on activation steering. SPREE identifies a popularity direction in representation space and adaptively steers model activations based on an estimate of each user's personal popularity bias, allowing both the direction and magnitude of steering to vary across users. Unlike global debiasing approaches, SPREE explicitly targets alignment rather than uniformly reducing popularity. Experiments across multiple datasets show that SPREE consistently improves user-level popularity alignment while preserving recommendation quality.

Paper Structure

This paper contains 35 sections, 12 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Problematic cases for the UPD metric regarding content validity: Left: Recommendations for users $u$ (---) and $u'$ (- -) are positively and negatively biased, respectively. Nevertheless, UPD assigns the same popularity bias due to the symmetry of the underlying Jensen–Shannon divergence. Middle: UPD does not measure popularity bias within bins (red dashed lines). Right: Since UPD does not differentiate within bins, no popularity bias is reported despite different popularity variance in ${\mathcal{H}}_u$ and ${\mathcal{R}}_u$.
  • Figure 2: Visualization of popularity quantile calibration. Left: In the toy example, the user’s popularity preference $p(s | u)$ is more widely spread than that of the recommender, $q(s | u)$. Middle: Consequently, their CDFs diverge. PCE measures the difference between the user CDF $F_p(s)$ and the recommender CDF $F_q(s)$ at fixed quantiles $\tau \in \{0, 0.2, 0.4, 0.6, 0.8, 1\}$ (orange line). Right: The resulting calibration curve deviates from the perfectly calibrated oracle diagonal indicating the extent of miscalibration.
  • Figure 3: Trade-off between performance (NDCG) and popularity alignment (PCE) across different datasets and mitigation baselines. Points represent different mitigation strengths. The plots show the vicinity of the base model (larger plots on the left) and the global trends (smaller plots on the right).
  • Figure 4: Average calibration curves across users. Except for fs-tky, where the base model is already well calibrated, the recommender tends to favor less popular items (black curve below the dashed gray identity line). SPREE (green) mitigates this bias by bringing the curve closer to the oracle identity. In contrast, baselines that reduce global popularity (PopSteer and IPR) increase this miscalibration, while Personalized Popularity (PP) may overcorrect by recommending only the most popular items in the user’s history.
  • Figure 5: Linear probe accuracy across sequence positions $t$ and transformer blocks $l$ for ml-1m, ml-20m, and fs-tky. After the padded region, accuracy rapidly converges toward 100%, indicating that popularity is linearly separable in activation space. Popularity is most strongly encoded at the final sequence positions in the last block.

Theorems & Definitions (1)

  • Definition 1: Popularity quantile-calibrated