Table of Contents
Fetching ...

Harm Mitigation in Recommender Systems under User Preference Dynamics

Jerry Chee, Shankar Kalyanaraman, Sindhu Kiranmai Ernala, Udi Weinsberg, Sarah Dean, Stratis Ioannidis

TL;DR

This paper tackles harm mitigation in recommender systems where user preferences evolve under exposure to recommendations. It introduces a dynamic attraction-based model that couples recommendation decisions to steady-state user profiles, producing a non-convex optimization problem for policy design. A contraction-based fixed-point analysis guarantees a unique stationary profile, and gradient-based methods via the implicit-function theorem enable effective policy optimization at stationarity. Empirical results on semi-synthetic MovieLens data show gradient-based policies achieve substantial improvements in the CTR–harm tradeoff, outperforming baselines by up to 77% and illustrating the importance of incorporating user dynamics into harm mitigation strategies.

Abstract

We consider a recommender system that takes into account the interplay between recommendations, the evolution of user interests, and harmful content. We model the impact of recommendations on user behavior, particularly the tendency to consume harmful content. We seek recommendation policies that establish a tradeoff between maximizing click-through rate (CTR) and mitigating harm. We establish conditions under which the user profile dynamics have a stationary point, and propose algorithms for finding an optimal recommendation policy at stationarity. We experiment on a semi-synthetic movie recommendation setting initialized with real data and observe that our policies outperform baselines at simultaneously maximizing CTR and mitigating harm.

Harm Mitigation in Recommender Systems under User Preference Dynamics

TL;DR

This paper tackles harm mitigation in recommender systems where user preferences evolve under exposure to recommendations. It introduces a dynamic attraction-based model that couples recommendation decisions to steady-state user profiles, producing a non-convex optimization problem for policy design. A contraction-based fixed-point analysis guarantees a unique stationary profile, and gradient-based methods via the implicit-function theorem enable effective policy optimization at stationarity. Empirical results on semi-synthetic MovieLens data show gradient-based policies achieve substantial improvements in the CTR–harm tradeoff, outperforming baselines by up to 77% and illustrating the importance of incorporating user dynamics into harm mitigation strategies.

Abstract

We consider a recommender system that takes into account the interplay between recommendations, the evolution of user interests, and harmful content. We model the impact of recommendations on user behavior, particularly the tendency to consume harmful content. We seek recommendation policies that establish a tradeoff between maximizing click-through rate (CTR) and mitigating harm. We establish conditions under which the user profile dynamics have a stationary point, and propose algorithms for finding an optimal recommendation policy at stationarity. We experiment on a semi-synthetic movie recommendation setting initialized with real data and observe that our policies outperform baselines at simultaneously maximizing CTR and mitigating harm.
Paper Structure (29 sections, 1 theorem, 68 equations, 65 figures, 5 tables)

This paper contains 29 sections, 1 theorem, 68 equations, 65 figures, 5 tables.

Key Result

lemma 1

Stationary user profiles $\bar{u}\in \mathbb{R}^d$ satisfy the following fixed-point equation: where map $F:\mathbb{\Pi}\times \mathbb{R}^d$ is given by: where $p_\mathtt{H}$, $p_\mathtt{NH}$, are given by Eq. eq:phm, and $p_v$ is the probability that the user selects item $v\in \Omega$.

Figures (65)

  • Figure 1: Our full model, incorporating user dynamics in the presence of recommender system interactions. A recommender presents a recommendation set $E_t$ to the user, who chooses to either interact with the recommended content ($\mathtt{CLK}$), or organically select an item from the entire catalog ($\mathtt{ORG}$), which includes harmful content. The user profile is subsequently updated under at attraction model lu2014optimalkrauth2020offlinege2020understandingmansoury2020feedback, leaning closer to the item $v(t)$ selected by the user.
  • Figure 4: Effect of modifying $\lambda$, $\beta$, and $c$ on the objective attained by different policies for the Action genre. Additional genres, and impact on $p_\mathtt{CLK}\xspace$, $p_\mathtt{H}$, are shown in Appendix \ref{['sec:supp_exp']}. Increasing any parameter decreases the objective attained by every policy. We observe that increasing $\lambda$ naturally increases the performance gap of the Grad policy. Increasing $\beta$ has the opposite effect, as it limits the ability of all policies to impact a user's profile. Parameter $c$ also increases the improvement of Grad over other policies as, the larger $c$ is, the less likely the recommendation is to be accepted, and the more important it becomes to succesfully minimize harm.
  • Figure 11: Effect of modifying $\lambda$, $\beta$, and $c$ on the objective attained by different policies for different genres. Increasing any parameter decrease the objective attained by every policy. We observe that increasing $\lambda$ naturally increases the performance gap of the Grad policy. Increasing $\beta$ has the opposite effect, as it limits the ability of the policy to impact a user's profile. Parameter $c$ also increases the improvement of Grad over other policies as, the larger $c$ is, the less likely the recommendation is to be accepted, and the more important it becomes to minimize harm.
  • Figure 12: Effect of modifying $\lambda$, $\beta$, and $c$ on $p_\mathtt{CLK}\xspace$ attained by different policies for different genres. Increasing any parameter decrease the objective attained by every policy. We observe that increasing $\lambda$ naturally increases the performance gap of the Grad policy. Increasing $\beta$ has the opposite effect, as it limits the ability of the policy to impact a user's profile. Parameter $c$ also increases the improvement of Grad over other policies as, the larger $c$ is, the less likely the recommendation is to be accepted, and the more important it becomes to minimize harm.
  • Figure 13: Effect of modifying $\lambda$, $\beta$, and $c$ on $p_\mathtt{H}$ attained by different policies for different genres. Increasing any parameter decrease the objective attained by every policy. We observe that increasing $\lambda$ naturally increases the performance gap of the Grad policy. Increasing $\beta$ has the opposite effect, as it limits the ability of the policy to impact a user's profile. Parameter $c$ also increases the improvement of Grad over other policies as, the larger $c$ is, the less likely the recommendation is to be accepted, and the more important it becomes to minimize harm.
  • ...and 60 more figures

Theorems & Definitions (1)

  • lemma 1