Table of Contents
Fetching ...

Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, Thore Graepel

TL;DR

This work advances multi-agent learning by introducing Joint Policy-Space Response Oracles (JPSRO) for n-player, general-sum extensive-form games and advocating correlated-equilibrium meta-solvers, especially Maximum Gini Correlated Equilibrium (MG(C)CE). MG(C)CE frames the equilibrium selection problem as a tractable quadratic program with a unique solution and a tunable family of equilibria parameterized by $oldsymbol{\\epsilon}$, balancing welfare and robustness. The paper proves convergence: JPSRO(CCE) converges to a Coarse CE and JPSRO(CE) to a CE, and demonstrates favorable convergence and welfare across Kuhn Poker, Trade Comm, and Sheriff. The approach supports scalable representations, invariant transformations, and efficient computation, with potential to scale via function approximation and online solvers in real-world, large-scale multi-agent settings.

Abstract

Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO), an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium. We further suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Maximum Gini Correlated Equilibrium (MGCE), a principled and computationally efficient family of solutions for solving the correlated equilibrium selection problem. We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games.

Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers

TL;DR

This work advances multi-agent learning by introducing Joint Policy-Space Response Oracles (JPSRO) for n-player, general-sum extensive-form games and advocating correlated-equilibrium meta-solvers, especially Maximum Gini Correlated Equilibrium (MG(C)CE). MG(C)CE frames the equilibrium selection problem as a tractable quadratic program with a unique solution and a tunable family of equilibria parameterized by , balancing welfare and robustness. The paper proves convergence: JPSRO(CCE) converges to a Coarse CE and JPSRO(CE) to a CE, and demonstrates favorable convergence and welfare across Kuhn Poker, Trade Comm, and Sheriff. The approach supports scalable representations, invariant transformations, and efficient computation, with potential to scale via function approximation and online solvers in real-world, large-scale multi-agent settings.

Abstract

Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO), an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium. We further suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Maximum Gini Correlated Equilibrium (MGCE), a principled and computationally efficient family of solutions for solving the correlated equilibrium selection problem. We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games.

Paper Structure

This paper contains 56 sections, 18 theorems, 53 equations, 6 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

MG(C)CE provides a unique solution to the equilibrium solution problem and always exists.

Figures (6)

  • Figure 1: The solution landscape for the traffic lights game. The solid polytope shows the space of CE joint strategies, and the dotted surface shows factorizable joint strategies. NEs are where the surface and polytope intersect. There are three unsatisfying NEs: mixed spends most of its time waiting and does not avoid crashing, the others favour only the row or column player. One MWCE provides a better solution (note that Row NE and Col NE, and any mixture of the two are also MWCE solutions). The center of the tetrahedron is the uniform distribution and the MECE and MGCE attempt to be near this point. The dashed lines correspond to the family of solutions permitted by MGCE and MECE when varying the approximation parameter $\epsilon$. Both have $(GW, WG) = (0.5, 0.5)$ as the $\min\epsilon$ solution. Player payoffs are given in parenthesis.
  • Figure 2: JPSRO(CCE) on various games. Additional metrics can be found in Section \ref{['supp_sec:experiments']}. MGCCE is consistently a good choice of MS over the games tested.
  • Figure 3: JPSRO(CCE) and JPSRO(CE) on three-player Kuhn Poker. All (C)CE MSs, PRD and $\alpha$-Rank find joint policies capable of supporting equilibrium (although $\alpha$-Rank was slow and was terminated after 6 hours). This is some evidence that classic MSs designed for the two-player, zero-sum setting can generalize well to the three-player, zero-sum.
  • Figure 4: JPSRO(CCE) and JPSRO(CE) on three-item Trade Comm. In JPSRO(CCE), $\frac{1}{100}\min$-MGCCE fails to find the maximum welfare equilibrium, however, all other (C)CE MSs find the maximum welfare equilibrium. Unexpectedly, $\alpha$-Rank performs well on this game, while all other classic MSs fail to make progress on this purely cooperative game. Performing well on this game requires exploration, so the random joint MS is able to make progress, albeit naively and slowly.
  • Figure 5: JPSRO(CCE) and JPSRO(CE) on Sheriff. This game is interesting because it is general-sum and different solution concepts have different optimal maximum welfare values. The maximum welfare NFCCE is $13.64$ for the smuggler and $2.0$ for the sheriff which JPSRO(CCE) successfully finds, while the maximum welfare NFCE is $0.82$ for the smuggler and $0.0$ for the sheriff which JPSRO(CE) successfully finds. This demonstrates the appeal of using NFCCE as a target equilibrium. Interestingly, for this game, $\frac{1}{100}\epsilon$-MG(C)CE was able to produce BRs of high enough quality to converge which is evidence that scaled methods that only approximate (C)CEs may be enough in some settings. RMWCCE converged to an equilibrium, but not the welfare maximizing one, providing evidence that greedy MSs are not always suitable. In a similar argument, $\min$-$\epsilon$-MGCCE did not reach the maximum welfare solution within the allocated number of iterations. RV(C)CE is efficient at finding novel policies but ones of limited utility. PRD and $\alpha$-Rank perform well and find the maximum welfare (C)CE equilibria.
  • ...and 1 more figures

Theorems & Definitions (31)

  • Theorem 1: Uniqueness and Existence
  • Theorem 2: Scalable Representation
  • Theorem 3: Existence of Full-Support $\epsilon$-MG(C)CE
  • Theorem 4
  • Theorem 5: Affine Payoff Transformation Invariance
  • Theorem 6: CCE Convergence
  • Theorem 7: CE Convergence
  • proof
  • Theorem 1: Uniqueness and Existence
  • proof
  • ...and 21 more