Influence of Recommender Systems on Users: A Dynamical Systems Analysis
Prabhat Lankireddy, Jayakrishnan Nair, D Manjunath
TL;DR
This work develops a formal model of how a contextual linear bandit recommender system interacts with users whose preferences evolve toward the recommendations. It employs the ODE method of stochastic approximation to derive a deterministic asymptotic dynamical system that captures the coupled evolution of algorithm state and user preferences, both in single-user and multi-user settings. The analysis reveals how exploration-exploitation tradeoffs shape long-term preferences, including filter bubbles and polarization under high exploitation, and it identifies conditions under which the RS can still learn true preferences despite model mismatch. The results highlight the potential for feedback loops in recommender environments and provide a rigorous framework for understanding and mitigating unintended consequences in both single- and multi-user contexts.
Abstract
We analyze the unintended effects that recommender systems have on the preferences of users that they are learning. We consider a contextual multi-armed bandit recommendation algorithm that learns optimal product recommendations based on user and product attributes. It is well known that the sequence of recommendations affects user preferences. However, typical learning algorithms treat the user attributes as static and disregard the impact of their recommendations on user preferences. Our interest is to analyze the effect of this mismatch between the model assumption of a static environment and the reality of an evolving environment affected by the recommendations. To perform this analysis, we introduce a model for the coupled evolution of a linear bandit recommendation system and its users, whose preferences are drawn towards the recommendations made by the algorithm. We describe a method, that is grounded in stochastic approximation theory, to come up with a dynamical system model that asymptotically approximates the mean behavior of the stochastic model. The resulting dynamical system captures the coupled evolution of the population preferences and the learning algorithm. Analyzing this dynamical system gives insight into the long-term properties of user preferences and the learning algorithm. Under certain conditions, we show that the RS is able to learn the population preferences in spite of the model mismatch. We discuss and characterize the relation between various parameters of the model and the long term preferences of users in this work. A key observation is that the exploration-exploitation tradeoff used by the recommendation algorithm significantly affects the long term preferences of users. Algorithms that exploit more can polarize user preferences, leading to the well-known filter bubble phenomenon.
