Incentive-Aware Recommender Systems in Two-Sided Markets

Xiaowu Dai; Wenlu Xu; Yuan Qi; Michael I. Jordan

Incentive-Aware Recommender Systems in Two-Sided Markets

Xiaowu Dai, Wenlu Xu, Yuan Qi, Michael I. Jordan

TL;DR

This paper addresses incentive-compatible exploration in two-sided online marketplaces where users may opt to exploit current information. It introduces two algorithms, ARP and MARP, that combine information design and randomized recommendations to induce exploration while respecting agents’ opportunity costs, achieving sublinear regret and ex-post fairness. ARP covers scenarios with known costs, while MARP handles private, unknown costs, with theoretical guarantees and empirical validation on synthetic and real data. The work has practical implications for platforms like social networks and ride-hailing services, and points to extensions for contextual bandits and adaptive clinical trials, with code available for replication.

Abstract

Online platforms in the Internet Economy commonly incorporate recommender systems that recommend products (or "arms") to users (or "agents"). A key challenge in this domain arises from myopic agents who are naturally incentivized to exploit by choosing the optimal arm based on current information, rather than exploring various alternatives to gather information that benefits the collective. We propose a novel recommender system that aligns with agents' incentives while achieving asymptotically optimal performance, as measured by regret in repeated interactions. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets, where the interactions of agents and arms are facilitated by recommender systems on online platforms. This model incorporates incentive constraints induced by agents' opportunity costs. In scenarios where opportunity costs are known to the platform, we show the existence of an incentive-compatible recommendation algorithm. This algorithm pools recommendations between a genuinely good arm and an unknown arm using a randomized and adaptive strategy. Moreover, when these opportunity costs are unknown, we introduce an algorithm that randomly pools recommendations across all arms, utilizing the cumulative loss from each arm as feedback for strategic exploration. We demonstrate that both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation. All code for using the proposed algorithms and reproducing results is made available on GitHub.

Incentive-Aware Recommender Systems in Two-Sided Markets

TL;DR

Abstract

Paper Structure (43 sections, 7 theorems, 37 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 43 sections, 7 theorems, 37 equations, 7 figures, 4 tables, 2 algorithms.

Introduction
Related Work
Mechanism Design and Incentivized Exploration
Matching Markets
Social Learning
Bayesian Persuasion
Our Contributions
Model
Interaction Protocol
Agent's Incentive
Information Design for Recommendation
Mechanism Design for Recommendation
Designer's Objective
Optimal Policy Under Known Opportunity Costs
Adaptive Recommendation Policy
...and 28 more sections

Key Result

theorem 1

Suppose that the assumption in Eq. eqn:positiveplat holds and parameters $\lambda,\theta_\tau, k$ are chosen according to Eqs. eqn:choiceoflambda, eqn:thetatau, and eqn:choiceofk, respectively. Then ARP in Algorithm alg:adaptivesampling guarantees the agent's incentive in Eq. eqn:incentivesct for an

Figures (7)

Figure 1: The recommendation process mediated by the designer for online marketplaces.
Figure 2: Illustrating paths of $p_{i,t}$ for first-best and second-best policies.
Figure 3: (Second-Best) The exploration rate $p_{i,t}$ in Eq. \ref{['eqn:choiceofLi']} for arm $i>1$.
Figure 4: The mean regret of ARP and alternative algorithms for Section \ref{['sec:bernoullibandit']}, based on 500 data replications. The three plots correspond to $c_*=0.05$, $c_* = 0.10$, and $c_*=0.15$, respectively.
Figure 5: The mean regret of MARP and alternative algorithms for Section \ref{['sec:bernoullibandit']}, based on 500 data replications. The three plots correspond to $c_t\sim Beta(0.9, 0.9)$, $c_t\sim Beta(1.1, 1.0)$, and $c_t\sim Beta(1.0, 1.1)$, respectively.
...and 2 more figures

Theorems & Definitions (7)

theorem 1
theorem 2
theorem 3
theorem 4
theorem 5
theorem 6
theorem 7

Incentive-Aware Recommender Systems in Two-Sided Markets

TL;DR

Abstract

Incentive-Aware Recommender Systems in Two-Sided Markets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (7)