Multi-Agent Training beyond Zero-Sum with Correlated Equilibrium Meta-Solvers
Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, Thore Graepel
TL;DR
This work advances multi-agent learning by introducing Joint Policy-Space Response Oracles (JPSRO) for n-player, general-sum extensive-form games and advocating correlated-equilibrium meta-solvers, especially Maximum Gini Correlated Equilibrium (MG(C)CE). MG(C)CE frames the equilibrium selection problem as a tractable quadratic program with a unique solution and a tunable family of equilibria parameterized by $oldsymbol{\\epsilon}$, balancing welfare and robustness. The paper proves convergence: JPSRO(CCE) converges to a Coarse CE and JPSRO(CE) to a CE, and demonstrates favorable convergence and welfare across Kuhn Poker, Trade Comm, and Sheriff. The approach supports scalable representations, invariant transformations, and efficient computation, with potential to scale via function approximation and online solvers in real-world, large-scale multi-agent settings.
Abstract
Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting. We propose Joint Policy-Space Response Oracles (JPSRO), an algorithm for training agents in n-player, general-sum extensive form games, which provably converges to an equilibrium. We further suggest correlated equilibria (CE) as promising meta-solvers, and propose a novel solution concept Maximum Gini Correlated Equilibrium (MGCE), a principled and computationally efficient family of solutions for solving the correlated equilibrium selection problem. We conduct several experiments using CE meta-solvers for JPSRO and demonstrate convergence on n-player, general-sum games.
