Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

Yuantong Li; Guang Cheng; Xiaowu Dai

Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

Yuantong Li, Guang Cheng, Xiaowu Dai

TL;DR

This work treats dynamic online recommendation in two-sided markets as a Dynamic Bayesian Incentive-Compatible problem, introducing DBICRP and a two-stage algorithm (RCB) that couples incentivized exploration with offline learning for exploitation. Theoretical results establish that, under Gaussian priors and regularity conditions, RCB achieves an $O(\sqrt{KdT})$ regret while satisfying Bayesian incentive compatibility, with a detailed analysis of cold-start sample requirements. Empirically, the approach demonstrates strong incentive gains, sublinear regret, and robustness in simulations and a real-world warfarin dosing dataset, highlighting its practical viability for incentive-aware recommendation. The framework advances principled incentive-aware online learning in contexts with heterogeneous, self-interested users and informative context, offering a scalable method for dynamic preference learning in online platforms.

Abstract

Recommender systems play a crucial role in internet economies by connecting users with relevant products or services. However, designing effective recommender systems faces two key challenges: (1) the exploration-exploitation tradeoff in balancing new product exploration against exploiting known preferences, and (2) dynamic incentive compatibility in accounting for users' self-interested behaviors and heterogeneous preferences. This paper formalizes these challenges into a Dynamic Bayesian Incentive-Compatible Recommendation Protocol (DBICRP). To address the DBICRP, we propose a two-stage algorithm (RCB) that integrates incentivized exploration with an efficient offline learning component for exploitation. In the first stage, our algorithm explores available products while maintaining dynamic incentive compatibility to determine sufficient sample sizes. The second stage employs inverse proportional gap sampling integrated with an arbitrary machine learning method to ensure sublinear regret. Theoretically, we prove that RCB achieves $O(\sqrt{KdT})$ regret and satisfies Bayesian incentive compatibility (BIC) under a Gaussian prior assumption. Empirically, we validate RCB's strong incentive gain, sublinear regret, and robustness through simulations and a real-world application on personalized warfarin dosing. Our work provides a principled approach for incentive-aware recommendation in online preference learning settings.

Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

TL;DR

regret while satisfying Bayesian incentive compatibility, with a detailed analysis of cold-start sample requirements. Empirically, the approach demonstrates strong incentive gains, sublinear regret, and robustness in simulations and a real-world warfarin dosing dataset, highlighting its practical viability for incentive-aware recommendation. The framework advances principled incentive-aware online learning in contexts with heterogeneous, self-interested users and informative context, offering a scalable method for dynamic preference learning in online platforms.

Abstract

regret and satisfies Bayesian incentive compatibility (BIC) under a Gaussian prior assumption. Empirically, we validate RCB's strong incentive gain, sublinear regret, and robustness through simulations and a real-world application on personalized warfarin dosing. Our work provides a principled approach for incentive-aware recommendation in online preference learning settings.

Paper Structure (34 sections, 14 theorems, 69 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 34 sections, 14 theorems, 69 equations, 7 figures, 1 table, 2 algorithms.

Introduction
Recommendation Protocol
Algorithms
Cold Start Stage
Exploitation Stage
Theory
Regularity Conditions
Dynamic Bayesian Incentive Compatible Constraint
Regret Upper Bound
Experiments
Simulation Studies
Real Data
Appendix
Additional Related Works
Incentivized exploration
...and 19 more sections

Key Result

Theorem 1

With Assumptions ass: post constant requirement of gap and lower prob - ass: minimum eigenvalue, and the prior follows the normal distribution, if the parameters $\mathsf{N} , L$ are larger than some prior-dependent constant and the platform follows the $\texttt{RCB}$ algorithm, then it preservers t And the exploitation stage starts at $m_{0}(\epsilon) \geq \lceil 2 + \log_{2} \mathsf{N}(\epsilon)

Figures (7)

Figure 1: Incentive gain (left) and cumulative regret (right) of Setting 1 (upper) and Setting 2 (lower).
Figure 2: Fraction of Incorrect Decision
Figure 3: Gain (top) and Regret (bottom) of Setting 2.
Figure 4: Gain (top) and Regret (bottom) of Setting 2.
Figure 5: Gain (top) and Regret (bottom) of Setting 2 with $\mathsf{N} = 10^{2}$.
...and 2 more figures

Theorems & Definitions (27)

Definition 1: $\epsilon$-DBIC
Definition 2
Theorem 1
Theorem 2
proof
proof
Lemma 1
Lemma 2
Corollary 1
proof
...and 17 more

Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

TL;DR

Abstract

Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (27)