Table of Contents
Fetching ...

With a Little Help From My Friends: Collective Manipulation in Risk-Controlling Recommender Systems

Giovanni De Toni, Cristian Consonni, Erasmo Purificato, Emilia Gomez, Bruno Lepri

Abstract

Recommendation systems have become central gatekeepers of online information, shaping user behaviour across a wide range of activities. In response, users increasingly organize and coordinate to steer algorithmic outcomes toward diverse goals, such as promoting relevant content or limiting harmful material, relying on platform affordances -- such as likes, reviews, or ratings. While these mechanisms can serve beneficial purposes, they can also be leveraged for adversarial manipulation, particularly in systems where such feedback directly informs safety guarantees. In this paper, we study this vulnerability in recently proposed risk-controlling recommender systems, which use binary user feedback (e.g., "Not Interested") to provably limit exposure to unwanted content via conformal risk control. We empirically demonstrate that their reliance on aggregate feedback signals makes them inherently susceptible to coordinated adversarial user behaviour. Using data from a large-scale online video-sharing platform, we show that a small coordinated group (comprising only 1% of the user population) can induce up to a 20% degradation in nDCG for non-adversarial users by exploiting the affordances provided by risk-controlling recommender systems. We evaluate simple, realistic attack strategies that require little to no knowledge of the underlying recommendation algorithm and find that, while coordinated users can significantly harm overall recommendation quality, they cannot selectively suppress specific content groups through reporting alone. Finally, we propose a mitigation strategy that shifts guarantees from the group level to the user level, showing empirically how it can reduce the impact of adversarial coordinated behaviour while ensuring personalized safety for individuals.

With a Little Help From My Friends: Collective Manipulation in Risk-Controlling Recommender Systems

Abstract

Recommendation systems have become central gatekeepers of online information, shaping user behaviour across a wide range of activities. In response, users increasingly organize and coordinate to steer algorithmic outcomes toward diverse goals, such as promoting relevant content or limiting harmful material, relying on platform affordances -- such as likes, reviews, or ratings. While these mechanisms can serve beneficial purposes, they can also be leveraged for adversarial manipulation, particularly in systems where such feedback directly informs safety guarantees. In this paper, we study this vulnerability in recently proposed risk-controlling recommender systems, which use binary user feedback (e.g., "Not Interested") to provably limit exposure to unwanted content via conformal risk control. We empirically demonstrate that their reliance on aggregate feedback signals makes them inherently susceptible to coordinated adversarial user behaviour. Using data from a large-scale online video-sharing platform, we show that a small coordinated group (comprising only 1% of the user population) can induce up to a 20% degradation in nDCG for non-adversarial users by exploiting the affordances provided by risk-controlling recommender systems. We evaluate simple, realistic attack strategies that require little to no knowledge of the underlying recommendation algorithm and find that, while coordinated users can significantly harm overall recommendation quality, they cannot selectively suppress specific content groups through reporting alone. Finally, we propose a mitigation strategy that shifts guarantees from the group level to the user level, showing empirically how it can reduce the impact of adversarial coordinated behaviour while ensuring personalized safety for individuals.

Paper Structure

This paper contains 25 sections, 2 theorems, 15 equations, 7 figures.

Key Result

Theorem 1

Consider a held-out calibration set $\mathcal{Q} = \{(u,i,h)_j\}_{j=1}^Q$. Let us assume that there are $K$ users within $Q$ that are behaving strategically. Given a target level $\alpha \in [0,1]$, let us consider $\hat{\lambda} \in \Lambda$ choosen as $\hat{\lambda} = \inf \left\{ \lambda : \frac{ where $r_\lambda^{adv} = \frac{1}{K}\sum_{u_{adv} \in \mathcal{K}} R(S_\lambda(U=u_{adv}, k))$ is t

Figures (7)

  • Figure 1: Expected empirical risk of adversarial users at calibration time as a function of the filtering threshold $\lambda \in [0,1]$, for reporting rates $\gamma \in \{0.001, 0.01, 0.1\}$ and different reporting strategies. The collective size is fixed to $\beta = 0.01$. Shaded areas indicate one standard deviation over 10 runs.
  • Figure 2: Expected performance reduction as a function of the fraction of adversarial users in the calibration set, $\beta \in [0.001, 0.1]$, for different reporting rates $\gamma \in \{0.001, 0.01, 0.1\}$ and reporting strategies. A reduction between 0 and 1 indicates an effect proportional to the size of the collective, whereas values greater than 1 indicate a disproportionate impact. Shaded areas denote confidence intervals obtained via bootstrapping over 10 runs.
  • Figure 3: Relationship between reporting intensity and risk reduction. A small fraction of carefully selected reported items in the calibration set (\ref{['fig:fraction_reported_items_per_strategy']}) can induce a sharp reduction in expected unwanted content for non-adversarial users relative to the baseline (None in \ref{['fig:expected_harmfulness_standard_users']}). Across all panels, the collective size is fixed to $\beta = 0.01$ and the reporting rate to $\gamma = 0.1$. We report the standard deviation over 10 runs as a shaded area or error bars.
  • Figure 4: Adversarial strategies substantially increase content repetition in recommendations, leading to degraded performance -- up to $80\%$ repeated items under the LowRisk strategy. The collective size is fixed to $\beta = 0.01$ and the reporting rate to $\gamma = 0.1$. We report the standard deviation over 10 runs as error bars.
  • Figure 5: Difference in top-$k$ exposure ($k = 20$) for group $g = 34$ in Kuaishou. Comparing Random$(\gamma)$ and Tag$(g = 34)$, both strategies flag approximately $10\%$ of items yet induce similar exposure reductions for non-adversarial users. Results are shown for collective sizes $\beta \in \{0.001, 0.005, 0.01\}$, with confidence intervals obtained via bootstrapping over 10 runs.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Corollary 2
  • proof