The Fault in Our Recommendations: On the Perils of Optimizing the Measurable
Omar Besbes, Yash Kanoria, Akshit Kumar
TL;DR
The paper demonstrates a fundamental misalignment between engagement maximization and user utility in recommendation systems. It analyzes a stylized infinite-horizon model with popular and niche items, showing that engagement-optimized policies can underperform on utility, especially when niche value is uncertain but potentially high for a minority. Introducing PEAR, a utility-aware exploratory policy, the authors prove that near-optimal utility can be achieved with only modest engagement sacrifices, and that this advantage grows as platforms become more forward-looking. Robustness checks with general distributions and prior-free policies (DICE) show that exploration can substantially improve utility without large engagement penalties in many regimes, highlighting a practical path to better long-term discovery in recommender systems.
Abstract
Recommendation systems are widespread, and through customized recommendations, promise to match users with options they will like. To that end, data on engagement is collected and used. Most recommendation systems are ranking-based, where they rank and recommend items based on their predicted engagement. However, the engagement signals are often only a crude proxy for utility, as data on the latter is rarely collected or available. This paper explores the following question: By optimizing for measurable proxies, are recommendation systems at risk of significantly under-delivering on utility? If so, how can one improve utility which is seldom measured? To study these questions, we introduce a model of repeated user consumption in which, at each interaction, users select between an outside option and the best option from a recommendation set. Our model accounts for user heterogeneity, with the majority preferring ``popular'' content, and a minority favoring ``niche'' content. The system initially lacks knowledge of individual user preferences but can learn them through observations of users' choices over time. Our theoretical and numerical analysis demonstrate that optimizing for engagement can lead to significant utility losses. Instead, we propose a utility-aware policy that initially recommends a mix of popular and niche content. As the platform becomes more forward-looking, our utility-aware policy achieves the best of both worlds: near-optimal utility and near-optimal engagement simultaneously. Our study elucidates an important feature of recommendation systems; given the ability to suggest multiple items, one can perform significant exploration without incurring significant reductions in engagement. By recommending high-risk, high-reward items alongside popular items, systems can enhance discovery of high utility items without significantly affecting engagement.
