Incentivizing High-Quality Content in Online Recommender Systems
Xinyan Hu, Meena Jagadeesan, Michael I. Jordan, Jacob Steinhardt
TL;DR
The paper reveals that standard no-regret online learning algorithms in content-recommender Stackelberg setups induce diminishing producer effort and suboptimal user welfare over time. It develops incentive-aware algorithms, including punitive and welfare-focused schemes, that sustain high producer effort and align content with heterogeneous user preferences. The methods leverage a dynamic Stackelberg framework with $D$-dimensional content vectors, contextual linear bandits, and carefully designed punishment criteria to shape equilibrium behavior. The findings highlight a tension between learning efficiency and content quality, offering practical guidance for platform design to improve user welfare while preserving adaptive learning. Overall, the work contributes both negative results for conventional learners and constructive policy tools to mitigate misaligned incentives in online recommender systems.
Abstract
In content recommender systems such as TikTok and YouTube, the platform's recommendation algorithm shapes content producer incentives. Many platforms employ online learning, which generates intertemporal incentives, since content produced today affects recommendations of future content. We study the game between producers and analyze the content created at equilibrium. We show that standard online learning algorithms, such as Hedge and EXP3, unfortunately incentivize producers to create low-quality content, where producers' effort approaches zero in the long run for typical learning rate schedules. Motivated by this negative result, we design learning algorithms that incentivize producers to invest high effort and achieve high user welfare. At a conceptual level, our work illustrates the unintended impact that a platform's learning algorithm can have on content quality and introduces algorithmic approaches to mitigating these effects.
