Leveraging Team Correlation for Approximating Equilibrium in Two-Team Zero-Sum Games
Naming Liu, Mingzhi Wang, Youzhi Zhang, Yaodong Yang, Bo An, Ying Wen
TL;DR
This work addresses the challenge of finding unexploitable equilibria in large two-team zero-sum games. It formalizes Correlated-Team Maxmin Equilibrium ($CTME$) as the optimal unexploitable policy under joint coordination, but recognizes that exact computation is intractable due to exponential joint policy spaces; to tackle this, it introduces restricted CTME ($rCTME$) with a configurable sample factor and a sequential correlation mechanism. The authors propose Sequential PSRO ($S$-PSRO), a practical algorithm that approximates $rCTME$ by constructing sequentially correlated team policies via a Sequential Best Response ($SeBR$) and a sequential communication protocol, allowing scalable learning across increasingly large games. Empirically, $S$-PSRO achieves lower exploitability and superior performance compared with baselines in Normal Form games (SAD), MAgent Battle, and Google Research Football, demonstrating its effectiveness in diverse, large-scale multi-agent settings. These contributions offer a scalable framework for robust strategic decision-making in adversarial multi-agent environments with potential applications in complex online games and security domains.
Abstract
Two-team zero-sum games are one of the most important paradigms in game theory. In this paper, we focus on finding an unexploitable equilibrium in large team games. An unexploitable equilibrium is a worst-case policy, where members in the opponent team cannot increase their team reward by taking any policy, e.g., cooperatively changing to other joint policies. As an optimal unexploitable equilibrium in two-team zero-sum games, correlated-team maxmin equilibrium remains unexploitable even in the worst case where players in the opponent team can achieve arbitrary cooperation through a joint team policy. However, finding such an equilibrium in large games is challenging due to the impracticality of evaluating the exponentially large number of joint policies. To solve this problem, we first introduce a general solution concept called restricted correlated-team maxmin equilibrium, which solves the problem of being impossible to evaluate all joint policy by a sample factor while avoiding an exploitation problem under the incomplete joint policy evaluation. We then develop an efficient sequential correlation mechanism, and based on which we propose an algorithm for approximating the unexploitable equilibrium in large games. We show that our approach achieves lower exploitability than the state-of-the-art baseline when encountering opponent teams with different exploitation ability in large team games including Google Research Football.
