Mastering Zero-Shot Interactions in Cooperative and Competitive Simultaneous Games
Yannik Mahlau, Frederik Schubert, Bodo Rosenhahn
TL;DR
The paper addresses zero-shot interaction in simultaneous multi-agent games by introducing the Smooth Best Response Logit Equilibrium (SBRLE) and the Albatross framework, which learns to approximate SBRLE through a two-stage training regime (proxy and response models) conditioned on opponent rationality via a temperature parameter $\tau$. By combining planning with self-play, Albatross adapts to agents of varying strength, enabling cooperative behavior with unknown partners and exploiting weak opponents in competitive settings. Empirical results show state-of-the-art performance in cooperative Overcooked (approximately 37.6% improvement over prior work) and superior exploitation of weaker agents in Battlesnake, while online estimation of opponents’ rationality sheds light on the dynamics of zero-shot interactions. This approach advances human-AI collaboration by modeling bounded rationality and provides a scalable framework for zero-shot coordination and competition in complex multi-agent environments, with publicly available code for reproducibility.
Abstract
The combination of self-play and planning has achieved great successes in sequential games, for instance in Chess and Go. However, adapting algorithms such as AlphaZero to simultaneous games poses a new challenge. In these games, missing information about concurrent actions of other agents is a limiting factor as they may select different Nash equilibria or do not play optimally at all. Thus, it is vital to model the behavior of the other agents when interacting with them in simultaneous games. To this end, we propose Albatross: AlphaZero for Learning Bounded-rational Agents and Temperature-based Response Optimization using Simulated Self-play. Albatross learns to play the novel equilibrium concept of a Smooth Best Response Logit Equilibrium (SBRLE), which enables cooperation and competition with agents of any playing strength. We perform an extensive evaluation of Albatross on a set of cooperative and competitive simultaneous perfect-information games. In contrast to AlphaZero, Albatross is able to exploit weak agents in the competitive game of Battlesnake. Additionally, it yields an improvement of 37.6% compared to previous state of the art in the cooperative Overcooked benchmark.
