Steering No-Regret Learners to a Desired Equilibrium

Brian Hu Zhang; Gabriele Farina; Ioannis Anagnostides; Federico Cacciamani; Stephen Marcus McAleer; Andreas Alexander Haupt; Andrea Celli; Nicola Gatti; Vincent Conitzer; Tuomas Sandholm

Steering No-Regret Learners to a Desired Equilibrium

Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

Abstract

A mediator observes no-regret learners playing an extensive-form game repeatedly across $T$ rounds. The mediator attempts to steer players toward some desirable predetermined equilibrium by giving (nonnegative) payments to players. We call this the steering problem. The steering problem captures problems several problems of interest, among them equilibrium selection and information design (persuasion). If the mediator's budget is unbounded, steering is trivial because the mediator can simply pay the players to play desirable actions. We study two bounds on the mediator's payments: a total budget and a per-round budget. If the mediator's total budget does not grow with $T$, we show that steering is impossible. However, we show that it is enough for the total budget to grow sublinearly with $T$, that is, for the average payment to vanish. When players' full strategies are observed at each round, we show that constant per-round budgets permit steering. In the more challenging setting where only trajectories through the game tree are observable, we show that steering is impossible with constant per-round budgets in general extensive-form games, but possible in normal-form games or if the per-round budget may itself depend on $T$. We also show how our results can be generalized to the case when the equilibrium is being computed online while steering is happening. We supplement our theoretical positive results with experiments highlighting the efficacy of steering in large games.

Steering No-Regret Learners to a Desired Equilibrium

Abstract

A mediator observes no-regret learners playing an extensive-form game repeatedly across

rounds. The mediator attempts to steer players toward some desirable predetermined equilibrium by giving (nonnegative) payments to players. We call this the steering problem. The steering problem captures problems several problems of interest, among them equilibrium selection and information design (persuasion). If the mediator's budget is unbounded, steering is trivial because the mediator can simply pay the players to play desirable actions. We study two bounds on the mediator's payments: a total budget and a per-round budget. If the mediator's total budget does not grow with

, we show that steering is impossible. However, we show that it is enough for the total budget to grow sublinearly with

, that is, for the average payment to vanish. When players' full strategies are observed at each round, we show that constant per-round budgets permit steering. In the more challenging setting where only trajectories through the game tree are observable, we show that steering is impossible with constant per-round budgets in general extensive-form games, but possible in normal-form games or if the per-round budget may itself depend on

. We also show how our results can be generalized to the case when the equilibrium is being computed online while steering is happening. We supplement our theoretical positive results with experiments highlighting the efficacy of steering in large games.

Paper Structure (30 sections, 24 theorems, 46 equations, 4 figures, 1 table)

This paper contains 30 sections, 24 theorems, 46 equations, 4 figures, 1 table.

Introduction
Summary of our Results
Related Work
$k$-implementation
Steering to near-optimal equilibria
Strategizing against no-regret learners
Preliminaries
The Steering Problem
Steering in Normal-Form Games
Steering in Extensive-Form Games
Steering with Full Feedback
Steering in the Trajectory-Feedback Setting
Lower bound
Upper bound
Other Equilibrium Notions and Online Steering
...and 15 more sections

Key Result

Proposition 1.1

For any fixed total budget $B$, there is a time horizon $T$ large enough that the steering problem is impossible.

Figures (4)

Figure 1: Left: An extensive-form version of a stag hunt. Chance plays uniformly at random at the root note, and the dotted line connecting the two nodes of Player 2 indicates an infoset: Player 2 cannot distinguish the two nodes. The game has two equilibria: one at the bottom-left corner, and one at the top-right corner (star). The latter is Pareto-dominant. Introducing vanishing realized payments alters the gradient landscape, steering players to the optimal equilibrium (star) instead of the suboptimal one (opposite corner). The capital letters show the players' initial strategies. Lighter color indicates higher welfare and the star shows the highest-welfare equilibrium. Further details are in \ref{['sec:figures']}.
Figure 2: The counterexample for \ref{['th:bandit-lower-bound']}, for $n=3$. Chance always plays uniformly at random. Infosets are linked by dotted lines (all nodes belonging to the same player are in the same infoset).
Figure 3: Sample experimental results. The blue line in each figure is the social welfare (left y-axis) of the players with steering enabled. The green dashed line is the social welfare without steering. The yellow line gives the payment (right y-axis) paid to each player. The flat black line denotes the welfare of the optimal equilibrium. The panels show the game, the equilibrium concept (in this figure, always EFCE). In all cases, the first ten iterations are a "burn-in" period during which no payments are issued; steering only begins after that.
Figure 4: The trajectories of EXP3 algorithms under different random initializations and vanishing payments. Trajectories with the same color correspond to the same initialization but under different realizations of the players' sampled actions.

Theorems & Definitions (42)

Proposition 1.1: Informal version of \ref{['prop:imposs-boundedpayments']}
Theorem 1.2: Informal version of \ref{['th:normalform']}
Theorem 1.3: Informal version of \ref{['th:offline-then-steer']}
Theorem 1.4: Informal version of \ref{['th:bandit-lower-bound']}
Theorem 1.5: Informal version of \ref{['th:offline-then-steer']}
Theorem 1.6: Informal version of \ref{['th:advice-necessary']}
Theorem 1.7: Informal version of \ref{['th:bayes-correlated']}
Theorem 1.8: Informal version of \ref{['th:online-steer']}
Definition 2.1
Definition 3.1: Steering Problem for Pure-Strategy Nash Equilibrium
...and 32 more

Steering No-Regret Learners to a Desired Equilibrium

Abstract

Steering No-Regret Learners to a Desired Equilibrium

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (42)