Incentive-Aware Recommender Systems in Two-Sided Markets
Xiaowu Dai, Wenlu Xu, Yuan Qi, Michael I. Jordan
TL;DR
This paper addresses incentive-compatible exploration in two-sided online marketplaces where users may opt to exploit current information. It introduces two algorithms, ARP and MARP, that combine information design and randomized recommendations to induce exploration while respecting agents’ opportunity costs, achieving sublinear regret and ex-post fairness. ARP covers scenarios with known costs, while MARP handles private, unknown costs, with theoretical guarantees and empirical validation on synthetic and real data. The work has practical implications for platforms like social networks and ride-hailing services, and points to extensions for contextual bandits and adaptive clinical trials, with code available for replication.
Abstract
Online platforms in the Internet Economy commonly incorporate recommender systems that recommend products (or "arms") to users (or "agents"). A key challenge in this domain arises from myopic agents who are naturally incentivized to exploit by choosing the optimal arm based on current information, rather than exploring various alternatives to gather information that benefits the collective. We propose a novel recommender system that aligns with agents' incentives while achieving asymptotically optimal performance, as measured by regret in repeated interactions. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets, where the interactions of agents and arms are facilitated by recommender systems on online platforms. This model incorporates incentive constraints induced by agents' opportunity costs. In scenarios where opportunity costs are known to the platform, we show the existence of an incentive-compatible recommendation algorithm. This algorithm pools recommendations between a genuinely good arm and an unknown arm using a randomized and adaptive strategy. Moreover, when these opportunity costs are unknown, we introduce an algorithm that randomly pools recommendations across all arms, utilizing the cumulative loss from each arm as feedback for strategic exploration. We demonstrate that both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation. All code for using the proposed algorithms and reproducing results is made available on GitHub.
