Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

Brian Hu Zhang; Gabriele Farina; Ioannis Anagnostides; Federico Cacciamani; Stephen Marcus McAleer; Andreas Alexander Haupt; Andrea Celli; Nicola Gatti; Vincent Conitzer; Tuomas Sandholm

Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

TL;DR

This reformulation allows to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates.

Abstract

We introduce a new approach for computing optimal equilibria via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum game. This reformulation allows to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates. We demonstrate the practical scalability and flexibility of our approach by attaining state-of-the-art performance in benchmark tabular games, and by computing an optimal mechanism for a sequential auction design problem using deep reinforcement learning.

Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

TL;DR

Abstract

Paper Structure (75 sections, 19 theorems, 21 equations, 3 figures, 5 tables, 2 algorithms)

This paper contains 75 sections, 19 theorems, 21 equations, 3 figures, 5 tables, 2 algorithms.

Introduction
Summary of Our Results
Experimental results
Related work
Preliminaries
Revelation principle
Lagrangian relaxations and a reduction to a zero-sum game
"Direct" Lagrangian
Last-iterate convergence
Thresholding and binary search
Experimental evaluation
Optimal equilibria in extensive-form games
Exact sequential auction design
Scalable sequential auction design via deep reinforcement learning
Conclusions
...and 60 more sections

Key Result

Proposition 3.0

The problem eq:saddle-point admits a finite saddle-point solution $(\bm\mu^*, \bm x^*, \lambda^*)$. Moreover, for all fixed $\lambda > \lambda^*$, the problems eq:saddle-point and eq:lp have the same value and same set of optimal solutions.

Figures (3)

Figure 1: Exploitability is measured by summing the best response for both bidders to the mechanism. Zero exploitability corresponds to incentive compatibility. In a sequential auction with budgets, our method is able to achieve higher revenue than second-price auctions and better incentive compatibility than a first-price auction.
Figure 2: Left: configuration 1 (used for $\texttt{RS212}$, $\texttt{RS213}$). Right: configuration 2 (used for $\texttt{RS222}$, $\texttt{RS223}$). In both cases the position of the two drivers is randomly chosen at the beginning of the game, edge costs are unitary, and the reward for each node is indicated between curly brackets.
Figure : Optimal equilibrium value for correlated equilibrium concepts. 'Pl. 1' is the utility for Player 1 in the Player 1-optimal equilibrium. 'Pl. 2' and 'Pl. 3' are similar. In two-player games, 'SW' is the welfare of the welfare-maximizing equilibrium. (these three values, of course, may come from three different equilibria.) The three-player games are zero-sum, so optimizing welfare makes no sense (the welfare is always zero).

Theorems & Definitions (29)

Definition 2.1
Definition 2.2
Proposition 3.0
Proposition 3.1
proof
Corollary 3.1
Corollary 3.1: Improved rates via optimism
Theorem 3.2: Last-iterate convergence to optimal equilibria in general games
Proposition 3.3
proof
...and 19 more

Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

TL;DR

Abstract

Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (29)