Table of Contents
Fetching ...

Efficient $Φ$-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games

Brian Hu Zhang, Ioannis Anagnostides, Gabriele Farina, Tuomas Sandholm

TL;DR

The paper studies efficient computation of correlated equilibria in extensive-form games through parameterized regret minimization with sets of deviations that interpolate between external regret and swap regret. It introduces k-mediator deviations and degree-k polynomial swap deviations, proving regret bounds of $N^{O(k)}/\epsilon^2$ rounds for k-mediators and $N^{O((kd)^3)}/\epsilon^2$ rounds for degree-k deviations in extensive-form trees, with favorable scaling when the game tree is balanced. A key technical innovation is replacing hard fixed-point computations with approximate fixed points in expectation, using consistent deviation maps like the behavioral map $\beta$ and Carathéodory map $\gamma$, enabling fully polynomial no-regret learners in many regimes. These results yield a parameterized tractability framework for EF-equilibria and provide faster algorithms for computing EFCE and related equilibria in practical settings, especially under shallow or balanced trees and moderate depth.

Abstract

Recent breakthrough results by Dagan, Daskalakis, Fishelson and Golowich [2023] and Peng and Rubinstein [2023] established an efficient algorithm attaining at most $ε$ swap regret over extensive-form strategy spaces of dimension $N$ in $N^{\tilde O(1/ε)}$ rounds. On the other extreme, Farina and Pipis [2023] developed an efficient algorithm for minimizing the weaker notion of linear-swap regret in $\mathsf{poly}(N)/ε^2$ rounds. In this paper, we develop efficient parameterized algorithms for regimes between these two extremes. We introduce the set of $k$-mediator deviations, which generalize the untimed communication deviations recently introduced by Zhang, Farina and Sandholm [2024] to the case of having multiple mediators, and we develop algorithms for minimizing the regret with respect to this set of deviations in $N^{O(k)}/ε^2$ rounds. Moreover, by relating $k$-mediator deviations to low-degree polynomials, we show that regret minimization against degree-$k$ polynomial swap deviations is achievable in $N^{O(kd)^3}/ε^2$ rounds, where $d$ is the depth of the game, assuming a constant branching factor. For a fixed degree $k$, this is polynomial for Bayesian games and quasipolynomial more broadly when $d = \mathsf{polylog} N$ -- the usual balancedness assumption on the game tree.

Efficient $Φ$-Regret Minimization with Low-Degree Swap Deviations in Extensive-Form Games

TL;DR

The paper studies efficient computation of correlated equilibria in extensive-form games through parameterized regret minimization with sets of deviations that interpolate between external regret and swap regret. It introduces k-mediator deviations and degree-k polynomial swap deviations, proving regret bounds of rounds for k-mediators and rounds for degree-k deviations in extensive-form trees, with favorable scaling when the game tree is balanced. A key technical innovation is replacing hard fixed-point computations with approximate fixed points in expectation, using consistent deviation maps like the behavioral map and Carathéodory map , enabling fully polynomial no-regret learners in many regimes. These results yield a parameterized tractability framework for EF-equilibria and provide faster algorithms for computing EFCE and related equilibria in practical settings, especially under shallow or balanced trees and moderate depth.

Abstract

Recent breakthrough results by Dagan, Daskalakis, Fishelson and Golowich [2023] and Peng and Rubinstein [2023] established an efficient algorithm attaining at most swap regret over extensive-form strategy spaces of dimension in rounds. On the other extreme, Farina and Pipis [2023] developed an efficient algorithm for minimizing the weaker notion of linear-swap regret in rounds. In this paper, we develop efficient parameterized algorithms for regimes between these two extremes. We introduce the set of -mediator deviations, which generalize the untimed communication deviations recently introduced by Zhang, Farina and Sandholm [2024] to the case of having multiple mediators, and we develop algorithms for minimizing the regret with respect to this set of deviations in rounds. Moreover, by relating -mediator deviations to low-degree polynomials, we show that regret minimization against degree- polynomial swap deviations is achievable in rounds, where is the depth of the game, assuming a constant branching factor. For a fixed degree , this is polynomial for Bayesian games and quasipolynomial more broadly when -- the usual balancedness assumption on the game tree.
Paper Structure (37 sections, 41 theorems, 35 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 37 sections, 41 theorems, 35 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Theorem 3.1

There is an online algorithm incurring (average) $\Phi^k_{\mathrm{DT}}$-regret at most $\epsilon$ in $N^{O(k)}/\epsilon^2$ rounds with a per-round running time of $N^{O(k)}/\epsilon$.

Figures (2)

  • Figure 1: An example of a tree-form decision problem. Decision points are black squares with white text labels; observataion points are white squares. Edges are labeled with action names, which are numbers. Pure strategies in this decision problem are identified with vectors $\bm{x} = (x_1, x_2, x_3, x_4, x_5) \in \{0, 1\}^5$ satisfying $1-x_1=x_2+x_3=x_4+x_5$.
  • Figure 2: A representation of the deviation $\phi(\bm{x}) = (x_1+x_3, x_2x_4, x_2x_5, x_2, 0)$ (discussed in \ref{['sec:consistent-map']}) in the decision problem $\mathcal{X}$ in \ref{['fig:dp-example']}, as a strategy in $\mathcal{X} \otimes \bar{\mathcal{X}} \otimes \bar{\mathcal{X}}$, i.e., with $k=2$ mediators. (For an example of a one-mediator deviation, see Zhang24:Mediator.) Again, black squares are decision nodes and white squares are observation nodes. Nodes are labeled with their state representations: the state in $\mathcal{X}$ first (in blue), and the two mediator states after (in red). Similarly, blue edge labels indicate interactions with the decision problem ( i.e., playing actions and receiving observations in $\mathcal{X}$), and red edge labels indicate interactions with the mediators ( i.e., querying and receiving action recommendations from the mediators). Redundant edges (such as those in which the decision problem in $\mathcal{X}$ has terminated) are omitted. The deviation is shown in thick black lines. For example, $\phi_2(\bm{x}) = x_2 x_4$ because the only state in which the deviator plays action 2$_{\boldsymbol{}}$ is when the mediator state is (2$_{\boldsymbol{}}$,4$_{\boldsymbol{}}$). $\phi_1(\bm{x}) = x_1 + x_3$ because the deviator plays action 1$_{\boldsymbol{}}$ at mediator states (1$_{\boldsymbol{}}$,1$_{\boldsymbol{}}$) and (3$_{\boldsymbol{}}$,0$_{\boldsymbol{}}$), which would give the formula $\phi_1(\bm{x}) = x_1^2 + x_3 x_0$ (where $x_0 := 1-x_1$), but one can easily check that $x_1^2 + x_3 x_0 = x_1 + x_3$ for all $x \in \mathcal{X}$.

Theorems & Definitions (67)

  • Remark 2.1
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Corollary 4.0
  • Theorem 4.1: Midrijanis04:Exact
  • Definition 4.1
  • Definition 4.1
  • Remark A.1: Swap versus internal regret
  • ...and 57 more