Table of Contents
Fetching ...

Incentivized Exploration via Filtered Posterior Sampling

Anand Kalvit, Aleksandrs Slivkins, Yonatan Gur

TL;DR

The paper studies incentivized exploration (IE) in sequential-agent settings where a principal can influence exploratory actions through information signals. It proposes filtered posterior sampling, a semantics-consistent Thompson Sampling variant, and proves Bayesian incentive-compatibility (BIC) under a warm-start spectral-diversity condition, unifying analyses across private/public types and correlated priors. The work derives corollaries for private types, informative recommendations, sleeping bandits, combinatorial semi-bandits, and linear bandits with public types, and also shows that other native algorithms (e.g., UCB, filtered least squares) admit similar BIC guarantees under analogous conditions. It provides instance-dependent guarantees and demonstrates the broad applicability of posterior sampling as a general IE tool in complex, heterogeneous recommendation settings. Overall, the framework offers a unified, principled approach to incentivized exploration with practical implications for modern, heterogeneous recommender systems.

Abstract

We study "incentivized exploration" (IE) in social learning problems where the principal (a recommendation algorithm) can leverage information asymmetry to incentivize sequentially-arriving agents to take exploratory actions. We identify posterior sampling, an algorithmic approach that is well known in the multi-armed bandits literature, as a general-purpose solution for IE. In particular, we expand the existing scope of IE in several practically-relevant dimensions, from private agent types to informative recommendations to correlated Bayesian priors. We obtain a general analysis of posterior sampling in IE which allows us to subsume these extended settings as corollaries, while also recovering existing results as special cases.

Incentivized Exploration via Filtered Posterior Sampling

TL;DR

The paper studies incentivized exploration (IE) in sequential-agent settings where a principal can influence exploratory actions through information signals. It proposes filtered posterior sampling, a semantics-consistent Thompson Sampling variant, and proves Bayesian incentive-compatibility (BIC) under a warm-start spectral-diversity condition, unifying analyses across private/public types and correlated priors. The work derives corollaries for private types, informative recommendations, sleeping bandits, combinatorial semi-bandits, and linear bandits with public types, and also shows that other native algorithms (e.g., UCB, filtered least squares) admit similar BIC guarantees under analogous conditions. It provides instance-dependent guarantees and demonstrates the broad applicability of posterior sampling as a general IE tool in complex, heterogeneous recommendation settings. Overall, the framework offers a unified, principled approach to incentivized exploration with practical implications for modern, heterogeneous recommender systems.

Abstract

We study "incentivized exploration" (IE) in social learning problems where the principal (a recommendation algorithm) can leverage information asymmetry to incentivize sequentially-arriving agents to take exploratory actions. We identify posterior sampling, an algorithmic approach that is well known in the multi-armed bandits literature, as a general-purpose solution for IE. In particular, we expand the existing scope of IE in several practically-relevant dimensions, from private agent types to informative recommendations to correlated Bayesian priors. We obtain a general analysis of posterior sampling in IE which allows us to subsume these extended settings as corollaries, while also recovering existing results as special cases.
Paper Structure (38 sections, 24 theorems, 63 equations, 2 figures)

This paper contains 38 sections, 24 theorems, 63 equations, 2 figures.

Key Result

Theorem 1

Assume that $\mathtt{\delta}_{0}{(\mathscr{Q})}>0$, as per Eq. (eqn:primitives3). Fix $\varepsilon>0$ and suppose that the spectral diversity of the warm-up data satisfies $\lambda_{{{[T_0]}}}\gtrsim \Lambda(\varepsilon) := {{\left( {D/\varepsilon^2} \right)}}\log{{\left( {2/\mathtt{\delta}_{0}{(\ma Then, filtered posterior sampling is $g(\varepsilon)$-BIC, with

Figures (2)

  • Figure 1: Protocol: Incentivized Exploration
  • Figure 2: "Semantics-consistent" messaging policy

Theorems & Definitions (43)

  • Remark 2.1
  • Remark 2.2
  • Definition 1
  • Remark 2.3
  • Definition 2: Menu-consistency
  • Remark 3.1
  • Remark 3.2
  • Theorem 1: General Guarantee
  • Remark 3.3
  • Corollary 1: private types
  • ...and 33 more