Table of Contents
Fetching ...

Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

Brian Hu Zhang, Gabriele Farina, Ioannis Anagnostides, Federico Cacciamani, Stephen Marcus McAleer, Andreas Alexander Haupt, Andrea Celli, Nicola Gatti, Vincent Conitzer, Tuomas Sandholm

TL;DR

This reformulation allows to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates.

Abstract

We introduce a new approach for computing optimal equilibria via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum game. This reformulation allows to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates. We demonstrate the practical scalability and flexibility of our approach by attaining state-of-the-art performance in benchmark tabular games, and by computing an optimal mechanism for a sequential auction design problem using deep reinforcement learning.

Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

TL;DR

This reformulation allows to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates.

Abstract

We introduce a new approach for computing optimal equilibria via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensive-form zero-sum game. This reformulation allows to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates. We demonstrate the practical scalability and flexibility of our approach by attaining state-of-the-art performance in benchmark tabular games, and by computing an optimal mechanism for a sequential auction design problem using deep reinforcement learning.
Paper Structure (75 sections, 19 theorems, 21 equations, 3 figures, 5 tables, 2 algorithms)

This paper contains 75 sections, 19 theorems, 21 equations, 3 figures, 5 tables, 2 algorithms.

Key Result

Proposition 3.0

The problem eq:saddle-point admits a finite saddle-point solution $(\bm\mu^*, \bm x^*, \lambda^*)$. Moreover, for all fixed $\lambda > \lambda^*$, the problems eq:saddle-point and eq:lp have the same value and same set of optimal solutions.

Figures (3)

  • Figure 1: Exploitability is measured by summing the best response for both bidders to the mechanism. Zero exploitability corresponds to incentive compatibility. In a sequential auction with budgets, our method is able to achieve higher revenue than second-price auctions and better incentive compatibility than a first-price auction.
  • Figure 2: Left: configuration 1 (used for $\texttt{RS212}$, $\texttt{RS213}$). Right: configuration 2 (used for $\texttt{RS222}$, $\texttt{RS223}$). In both cases the position of the two drivers is randomly chosen at the beginning of the game, edge costs are unitary, and the reward for each node is indicated between curly brackets.
  • Figure : Optimal equilibrium value for correlated equilibrium concepts. 'Pl. 1' is the utility for Player 1 in the Player 1-optimal equilibrium. 'Pl. 2' and 'Pl. 3' are similar. In two-player games, 'SW' is the welfare of the welfare-maximizing equilibrium. (these three values, of course, may come from three different equilibria.) The three-player games are zero-sum, so optimizing welfare makes no sense (the welfare is always zero).

Theorems & Definitions (29)

  • Definition 2.1
  • Definition 2.2
  • Proposition 3.0
  • Proposition 3.1
  • proof
  • Corollary 3.1
  • Corollary 3.1: Improved rates via optimism
  • Theorem 3.2: Last-iterate convergence to optimal equilibria in general games
  • Proposition 3.3
  • proof
  • ...and 19 more