Open-ended Learning in Symmetric Zero-sum Games
David Balduzzi, Marta Garnelo, Yoram Bachrach, Wojciech M. Czarnecki, Julien Perolat, Max Jaderberg, Thore Graepel
TL;DR
The paper develops a geometric framework for open-ended learning in symmetric zero-sum games by modeling agents as parametrized strategies in functional-form games (FFGs) and introducing the concept of gamescapes. It decomposes FFGs into transitive and cyclic components via a Hodge-like decomposition, and defines population-level metrics—population performance and effective diversity—to guide learning beyond single-agent improvements. Two algorithms, PSRO_N and PSRO_rN, are proposed, with PSRO_rN leveraging niching to expand diverse, effective strategies, demonstrated to outperform baselines in highly nontransitive games like Blotto and differentiable Lotto. The work unifies gradient-based learning with game-theoretic objectives, formalizes the notion of an evolving strategy landscape, and provides tools to analyze and increase the exploration of strategic dimensions through adaptive objectives. Overall, PSRO_rN yields stronger, more diverse populations and opens avenues for robust open-ended learning in complex, nontransitive environments.
Abstract
Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.
