Can Probabilistic Feedback Drive User Impacts in Online Platforms?
Jessica Dai, Bailey Flanigan, Nika Haghtalab, Meena Jagadeesan, Chara Podimata
TL;DR
This work reveals that even perfectly aligned learning objectives cannot guarantee positive user welfare, because probabilistic feedback can steer a recommender system toward promoting content with certain properties. By modeling content selection as a multi-armed bandit with arm-specific feedback rates, the authors quantify engagement via APC and FOC and introduce the concepts of feedback monotonicity and balance. They propose three black-box transformations (BB_Divide, BB_Pull, BB_DA) to study how probabilistic feedback shapes arm engagement and provide regret guarantees, supplemented by refined analyses and an empirical study of correlations between feedback rates and downstream effects. The results highlight the need to evaluate algorithms beyond regret, considering how feedback dynamics influence content exposure and user experience, with implications for platform design and policy.
Abstract
A common explanation for negative user impacts of content recommender systems is misalignment between the platform's objective and user welfare. In this work, we show that misalignment in the platform's objective is not the only potential cause of unintended impacts on users: even when the platform's objective is fully aligned with user welfare, the platform's learning algorithm can induce negative downstream impacts on users. The source of these user impacts is that different pieces of content may generate observable user reactions (feedback information) at different rates; these feedback rates may correlate with content properties, such as controversiality or demographic similarity of the creator, that affect the user experience. Since differences in feedback rates can impact how often the learning algorithm engages with different content, the learning algorithm may inadvertently promote content with certain such properties. Using the multi-armed bandit framework with probabilistic feedback, we examine the relationship between feedback rates and a learning algorithm's engagement with individual arms for different no-regret algorithms. We prove that no-regret algorithms can exhibit a wide range of dependencies: if the feedback rate of an arm increases, some no-regret algorithms engage with the arm more, some no-regret algorithms engage with the arm less, and other no-regret algorithms engage with the arm approximately the same number of times. From a platform design perspective, our results highlight the importance of looking beyond regret when measuring an algorithm's performance, and assessing the nature of a learning algorithm's engagement with different types of content as well as their resulting downstream impacts.
