Table of Contents
Fetching ...

Mediator Interpretation and Faster Learning Algorithms for Linear Correlated Equilibria in General Extensive-Form Games

Brian Hu Zhang, Gabriele Farina, Tuomas Sandholm

TL;DR

This paper provides several contributions shedding light on the fundamental nature of linear-swap regret, and develops state-of-the-art no-regret algorithms for computing linear correlated equilibria, both in theory and in practice.

Abstract

A recent paper by Farina & Pipis (2023) established the existence of uncoupled no-linear-swap regret dynamics with polynomial-time iterations in extensive-form games. The equilibrium points reached by these dynamics, known as linear correlated equilibria, are currently the tightest known relaxation of correlated equilibrium that can be learned in polynomial time in any finite extensive-form game. However, their properties remain vastly unexplored, and their computation is onerous. In this paper, we provide several contributions shedding light on the fundamental nature of linear-swap regret. First, we show a connection between linear deviations and a generalization of communication deviations in which the player can make queries to a "mediator" who replies with action recommendations, and, critically, the player is not constrained to match the timing of the game as would be the case for communication deviations. We coin this latter set the untimed communication (UTC) deviations. We show that the UTC deviations coincide precisely with the linear deviations, and therefore that any player minimizing UTC regret also minimizes linear-swap regret. We then leverage this connection to develop state-of-the-art no-regret algorithms for computing linear correlated equilibria, both in theory and in practice. In theory, our algorithms achieve polynomially better per-iteration runtimes; in practice, our algorithms represent the state of the art by several orders of magnitude.

Mediator Interpretation and Faster Learning Algorithms for Linear Correlated Equilibria in General Extensive-Form Games

TL;DR

This paper provides several contributions shedding light on the fundamental nature of linear-swap regret, and develops state-of-the-art no-regret algorithms for computing linear correlated equilibria, both in theory and in practice.

Abstract

A recent paper by Farina & Pipis (2023) established the existence of uncoupled no-linear-swap regret dynamics with polynomial-time iterations in extensive-form games. The equilibrium points reached by these dynamics, known as linear correlated equilibria, are currently the tightest known relaxation of correlated equilibrium that can be learned in polynomial time in any finite extensive-form game. However, their properties remain vastly unexplored, and their computation is onerous. In this paper, we provide several contributions shedding light on the fundamental nature of linear-swap regret. First, we show a connection between linear deviations and a generalization of communication deviations in which the player can make queries to a "mediator" who replies with action recommendations, and, critically, the player is not constrained to match the timing of the game as would be the case for communication deviations. We coin this latter set the untimed communication (UTC) deviations. We show that the UTC deviations coincide precisely with the linear deviations, and therefore that any player minimizing UTC regret also minimizes linear-swap regret. We then leverage this connection to develop state-of-the-art no-regret algorithms for computing linear correlated equilibria, both in theory and in practice. In theory, our algorithms achieve polynomially better per-iteration runtimes; in practice, our algorithms represent the state of the art by several orders of magnitude.
Paper Structure (22 sections, 14 theorems, 9 equations, 6 figures, 3 tables)

This paper contains 22 sections, 14 theorems, 9 equations, 6 figures, 3 tables.

Key Result

Theorem 1

The untimed communication deviations are precisely the linear deviations.

Figures (6)

  • Figure 1: An example extensive-form game in which communication deviations are a strict subset of UTC deviations. There are two players, P1 ($\color{p1color}\blacktriangle$) and P2 ($\color{p2color}\blacktriangledown$). Infosets for both players are labeled with capital letters ( e.g., A) and joined by dotted lines. Actions are labeled with lowercase letters and subscripts ( e.g., a$_{\boldsymbol{1}}$). P1's utility is labeled on each terminal node. P2's utility is zero everywhere (not labeled). Boxes are chance nodes, at which chance plays uniformly at random.
  • Figure 2: A part of the UTC decision problem for $\color{p1color}\blacktriangle$ corresponding to the same game. Nodes labeled $\color{p1color}\blacktriangle$ are decision points for $\color{p1color}\blacktriangle$; boxes are observation points. "..." denotes that the part of the decision problem following that edge has been omitted. Terminal nodes are unmarked. Red edge labels indicate interactions with the mediator; blue edge labels indicate interactions with the game. The profitable untimed deviation discussed in \ref{['sec:example']} is indicated by the thick lines. The first action taken in that profiable deviation, c$_{\boldsymbol{2}}$, is not legal for a timed deviator, because a timed deviator must query the mediator once before taking its first action. The matrices (lower-left corner) are the pair of matrices $(\mathbf A, \mathbf B)$ corresponding to that same deviation. All blank entries are 0.
  • Figure 3: Experimental comparison between our dynamics and those of Farina23:Polynomial for approximating a linear correlated equilibrium in extensive-form games. Each algorithm was run for a maximum of $100,\!000$ iterations or 6 hours, whichever was hit first. Runs that were terminated due to the time limit are marked with a square $\blacksquare$.
  • Figure 4: A visual depiction of the argument that \ref{['cor:convhull']} cannot generalize to all polytopes. The affine map $\phi$ maps the large blue polygon onto the small orange polygon, and $\phi$ is a vertex of the set of linear maps from polygon $ABCD$ to itself, yet $\phi(C)$ is not a vertex of $ABCD$.
  • Figure 5: Another example. The notation is shared with \ref{['fig:example']}. In this example, $\color{p1color}\blacktriangle$'s strategy set is equivalent to a simplex, so the linear deviations coincide with its swap deviations. As such, we will not bother to depict the UTC decision problem or matrices.
  • ...and 1 more figures

Theorems & Definitions (22)

  • Theorem
  • Theorem : Faster linear-swap regret minimization
  • Definition 2.1
  • Theorem 2.2: Gordon08:No
  • Definition 3.1
  • Definition 3.2
  • Theorem 4.1
  • Theorem 6.1: CFR for $\Phi_\textsc{Lin}$, special case of Zhang23:Team_DAG
  • Theorem 6.2
  • Corollary A.1
  • ...and 12 more