Table of Contents
Fetching ...

Designing Inferable Signaling Schemes for Bayesian Persuasion

Caleb Probine, Mustafa O. Karabag, Ufuk Topcu

TL;DR

This work studies Bayesian persuasion when the receiver must infer the signaling scheme from repeated interactions, rather than observing a committed strategy. It derives a bound on the value loss between the known-commitment and inference settings, showing the gap scales with the signal space and the receiver's distance to decision boundaries, and proves sample-complexity lower bounds that reveal persuasion can be harder to infer than Stackelberg. To design inferable schemes, it introduces projected stochastic gradient descent on the inference objective and a regularized information-design approach based on bounded rationality; both methods outperform naive known-commitment signaling, particularly in low-interaction regimes. Numerical experiments on a flower-game family, random games, and a safety-alert scenario demonstrate that the inferred schemes can use fewer signals while producing more distinct, favorable receiver actions, yielding higher sender utility under inference. The results offer practical strategies for constructing interpretable, learnable signaling schemes in information-constrained environments with real-world impact.

Abstract

In Bayesian persuasion, an informed sender, who observes a state, commits to a randomized signaling scheme that guides a self-interested receiver's actions. Classical models assume the receiver knows the commitment. We, instead, study the setting where the receiver infers the scheme from repeated interactions. We bound the sender's performance loss relative to the known-commitment case by a term that grows with the signal space size and shrinks as the receiver's optimal actions become more distinct. We then lower bound the samples required for the sender to approximately achieve their known-commitment performance in the inference setting. We show that the sender requires more samples in persuasion compared to the leader in a Stackelberg game, which includes commitment but lacks signaling. Motivated by these bounds, we propose two methods for designing inferable signaling schemes, one being stochastic gradient descent (SGD) on the sender's inference-setting utility, and the other being optimization with a boundedly-rational receiver model. SGD performs best in low-interaction regimes, but modeling the receiver as boundedly-rational and tuning the rationality constant still provides a flexible method for designing inferable schemes. Finally, we apply SGD to a safety alert example and show it to find schemes that have fewer signals and make citizens' optimal actions more distinct compared to the known-commitment case.

Designing Inferable Signaling Schemes for Bayesian Persuasion

TL;DR

This work studies Bayesian persuasion when the receiver must infer the signaling scheme from repeated interactions, rather than observing a committed strategy. It derives a bound on the value loss between the known-commitment and inference settings, showing the gap scales with the signal space and the receiver's distance to decision boundaries, and proves sample-complexity lower bounds that reveal persuasion can be harder to infer than Stackelberg. To design inferable schemes, it introduces projected stochastic gradient descent on the inference objective and a regularized information-design approach based on bounded rationality; both methods outperform naive known-commitment signaling, particularly in low-interaction regimes. Numerical experiments on a flower-game family, random games, and a safety-alert scenario demonstrate that the inferred schemes can use fewer signals while producing more distinct, favorable receiver actions, yielding higher sender utility under inference. The results offer practical strategies for constructing interpretable, learnable signaling schemes in information-constrained environments with real-world impact.

Abstract

In Bayesian persuasion, an informed sender, who observes a state, commits to a randomized signaling scheme that guides a self-interested receiver's actions. Classical models assume the receiver knows the commitment. We, instead, study the setting where the receiver infers the scheme from repeated interactions. We bound the sender's performance loss relative to the known-commitment case by a term that grows with the signal space size and shrinks as the receiver's optimal actions become more distinct. We then lower bound the samples required for the sender to approximately achieve their known-commitment performance in the inference setting. We show that the sender requires more samples in persuasion compared to the leader in a Stackelberg game, which includes commitment but lacks signaling. Motivated by these bounds, we propose two methods for designing inferable signaling schemes, one being stochastic gradient descent (SGD) on the sender's inference-setting utility, and the other being optimization with a boundedly-rational receiver model. SGD performs best in low-interaction regimes, but modeling the receiver as boundedly-rational and tuning the rationality constant still provides a flexible method for designing inferable schemes. Finally, we apply SGD to a safety alert example and show it to find schemes that have fewer signals and make citizens' optimal actions more distinct compared to the known-commitment case.

Paper Structure

This paper contains 21 sections, 8 theorems, 85 equations, 7 figures.

Key Result

Proposition 1

If $\mathsf{Range}(u_S) \subset [0,1]$, then

Figures (7)

  • Figure 1: Bayesian persuasion in the inference setting. At each round, the receiver takes action using their empirical estimate of the scheme, and then updates the estimate using the observed state.
  • Figure 2: The optimal scheme for a flower game $G^{1/6,3}$ induces posteriors near to decision boundaries. Red stars are posteriors $y_{s}^{\pi}$, the blue dot is the prior, and white dots are empirical distributions sampled from a posterior. We color the simplex by $f(y) = \sum_{\omega \in \Omega} u_S(\omega,a(y)) y(\omega)$. For a signal $s$, the leader's known-commitment expected reward, conditional on the signal, is $f(y_{s}^{\pi})$. The known-commitment-optimal posteriors lie in regions where $f(y)$ is large. For posteriors at the inner triangle's edge, the receiver is likely to take actions in the inference setting that give the sender zero reward.
  • Figure 3: Schemes found with SGD outperform regularization for small $k$, while for high $k$, we can tune performance through the rationality constant. We plot estimates of $IR_k$ for policies found using each method for the flower game. Dashed lines correspond to SGD, where we optimize $IR_{k_{opt}}$. Solid lines correspond to regularization with different $\lambda$ values. We evaluate SGD schemes averaged over the last $10$ SGD iterates, and we tune step sizes as different $k_{opt}$ and $\lambda$ values lead to different smoothness properties. SGD schemes perform best up to $k \approx 300$. Beyond $k = 300$, regularization performs well, and we can tune $\lambda$ to trade long and short-term performance. The known-commitment solution, i.e., $\lambda= \infty$, performs poorly as posteriors lie on receiver decision boundaries, and thus, for any $k$, the receiver takes poor actions for the sender with high probability.
  • Figure 4: Optimal schemes in the inferability setting have small signal spaces. We plot $IR_{300}(\pi_t)$ when optimizing $IR_{300}$ with SGD, where $\pi_t$ is the $t^{\text{th}}$ iterate. Define $\tilde{S}(\pi)$ as the size of the smallest set $\hat{S}$ of signals so that $\sum_{s\in \hat{S}} p^\pi(s) \langle u_{S}(:,a^\pi(s)), y_{s}^{\pi}\rangle$ is at least $0.99 BPR(\pi)$, i.e., the number of signals the receiver must learn for the sender to recover $BPR(\pi)$. We mark the values of $\tilde{S}(\pi_t)$. SGD improves the scheme by reducing $\tilde{S}(\pi)$.
  • Figure 5: Optimizing for boundedly-rational receivers provides a flexible regularization framework for the inference setting. We plot estimates of $IR_k$ for schemes derived with various $\lambda$ values on random games. We average over the approximately $60$ games on which projected gradient descent converges for all $\lambda$, and we tune step-sizes for different $\lambda$ values. Decreasing $\lambda$ trades long-term performance for gains when $k$ is small. Known-commitment-optimal schemes, i.e., $\lambda = \infty$, show poor performance for all $k$ values as posteriors may sit at decision boundaries.
  • ...and 2 more figures

Theorems & Definitions (9)

  • Proposition 1
  • Corollary 1
  • Definition 1: Flower game
  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Lemma 4
  • Lemma 5
  • Proposition 2