Prior-Agnostic Incentive-Compatible Exploration

Ramya Ramalingam; Osbert Bastani; Aaron Roth

Prior-Agnostic Incentive-Compatible Exploration

Ramya Ramalingam, Osbert Bastani, Aaron Roth

TL;DR

It is shown that (weighted) swap regret bounds on their own suffice to cause agents to faithfully follow forecasts in an approximate Bayes Nash equilibrium, even in dynamic environments in which agents have conflicting prior beliefs and the mechanism designer has no knowledge of any agents beliefs.

Abstract

In bandit settings, optimizing long-term regret metrics requires exploration, which corresponds to sometimes taking myopically sub-optimal actions. When a long-lived principal merely recommends actions to be executed by a sequence of different agents (as in an online recommendation platform) this provides an incentive misalignment: exploration is "worth it" for the principal but not for the agents. Prior work studies regret minimization under the constraint of Bayesian Incentive-Compatibility in a static stochastic setting with a fixed and common prior shared amongst the agents and the algorithm designer. We show that (weighted) swap regret bounds on their own suffice to cause agents to faithfully follow forecasts in an approximate Bayes Nash equilibrium, even in dynamic environments in which agents have conflicting prior beliefs and the mechanism designer has no knowledge of any agents beliefs. To obtain these bounds, it is necessary to assume that the agents have some degree of uncertainty not just about the rewards, but about their arrival time -- i.e. their relative position in the sequence of agents served by the algorithm. We instantiate our abstract bounds with concrete algorithms for guaranteeing adaptive and weighted regret in bandit settings.

Prior-Agnostic Incentive-Compatible Exploration

TL;DR

Abstract

Paper Structure (12 sections, 17 theorems, 94 equations)

This paper contains 12 sections, 17 theorems, 94 equations.

Introduction
Related Work
Background and Notation
Regret Gives Approximate Incentive-Compatibility
Preliminaries
Main Theorems
Auxiliary Lemmas
Proofs of Main Theorems
Concrete Bounds for Structured Temporal Beliefs
Conclusion
Additional Proofs
Extensions for Approximated Temporal Beliefs

Key Result

Lemma 4.1

Fix a sequence of reward mean vectors $\mu = (\mu_1, \mu_2, \cdots, \mu_T)$ that satisfies $||\mu_{t+1} - \mu_t||_{\infty} \leq \rho$ for each $t \in [T-1]$. Then for any two actions $a, b \in A$, any time-step $t \in [T]$, and any temporal belief $\mathcal{D} \in \Delta([T])$,

Theorems & Definitions (42)

Definition 3.1: Reward Belief
Definition 3.2: Full Arrival Belief, Temporal Belief
Definition 3.3: Transcript
Definition 3.4: Weighted external regret
Definition 3.5: Weighted swap-regret
Definition 3.6: Weighted pseudo-regret, swap-regret
Definition 3.7: Bayesian Incentive Compatibility
Definition 3.8: Recommendation Game
Remark 3.1
Definition 3.9: Incentive-Compatible Bayes Nash Equilibrium
...and 32 more

Prior-Agnostic Incentive-Compatible Exploration

TL;DR

Abstract

Prior-Agnostic Incentive-Compatible Exploration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (42)