Table of Contents
Fetching ...

Online Bayesian Persuasion Without a Clue

Francesco Bacchiocchi, Matteo Bollini, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

TL;DR

An algorithm is designed that achieves sublinear regret with respect to an optimal signaling scheme and a collection of lower bounds showing that the guarantees of such an algorithm are tight are provided.

Abstract

We study online Bayesian persuasion problems in which an informed sender repeatedly faces a receiver with the goal of influencing their behavior through the provision of payoff-relevant information. Previous works assume that the sender has knowledge about either the prior distribution over states of nature or receiver's utilities, or both. We relax such unrealistic assumptions by considering settings in which the sender does not know anything about the prior and the receiver. We design an algorithm that achieves sublinear regret with respect to an optimal signaling scheme, and we also provide a collection of lower bounds showing that the guarantees of such an algorithm are tight. Our algorithm works by searching a suitable space of signaling schemes in order to learn receiver's best responses. To do this, we leverage a non-standard representation of signaling schemes that allows to cleverly overcome the challenge of not knowing anything about the prior over states of nature and receiver's utilities. Finally, our results also allow to derive lower/upper bounds on the sample complexity of learning signaling schemes in a related Bayesian persuasion PAC-learning problem.

Online Bayesian Persuasion Without a Clue

TL;DR

An algorithm is designed that achieves sublinear regret with respect to an optimal signaling scheme and a collection of lower bounds showing that the guarantees of such an algorithm are tight are provided.

Abstract

We study online Bayesian persuasion problems in which an informed sender repeatedly faces a receiver with the goal of influencing their behavior through the provision of payoff-relevant information. Previous works assume that the sender has knowledge about either the prior distribution over states of nature or receiver's utilities, or both. We relax such unrealistic assumptions by considering settings in which the sender does not know anything about the prior and the receiver. We design an algorithm that achieves sublinear regret with respect to an optimal signaling scheme, and we also provide a collection of lower bounds showing that the guarantees of such an algorithm are tight. Our algorithm works by searching a suitable space of signaling schemes in order to learn receiver's best responses. To do this, we leverage a non-standard representation of signaling schemes that allows to cleverly overcome the challenge of not knowing anything about the prior over states of nature and receiver's utilities. Finally, our results also allow to derive lower/upper bounds on the sample complexity of learning signaling schemes in a related Bayesian persuasion PAC-learning problem.

Paper Structure

This paper contains 37 sections, 34 theorems, 106 equations, 6 figures, 13 algorithms.

Key Result

Theorem 1

The regret attained by Algorithm alg:main_algorithm is $R_T \le \widetilde{\mathcal{O}}( \binom{d+n}{d} n^{3/2} d^3 \sqrt {BT} )$.

Figures (6)

  • Figure 1: Representation of sets $\mathcal{X}^\square(a_i)$ and $\mathcal{X}^\triangle(a_i)$ for an instance with $d=2$ states of nature and $n=3$ receivers' actions.
  • Figure : Representation of sets $\mathcal{X}^\square(a_i)$ and $\mathcal{X}^\triangle(a_i)$ for an instance with $d=2$ states of nature and $n=3$ receivers' actions.
  • Figure : Learn-to-Persuade-w/o-Clue
  • Figure : Build-Search-Space
  • Figure : Compute-Signaling
  • ...and 1 more figures

Theorems & Definitions (60)

  • Definition 1: Slice
  • Theorem 1
  • Lemma 1
  • Definition 2: Phase 1 clean event
  • Lemma 2
  • Lemma 3
  • Definition 3: Phase 2 clean event
  • Lemma 4
  • Theorem 2
  • Theorem 3
  • ...and 50 more