Tree Search for Simultaneous Move Games via Equilibrium Approximation

Ryan Yu; Alex Olshevsky; Peter Chin

Tree Search for Simultaneous Move Games via Equilibrium Approximation

Ryan Yu, Alex Olshevsky, Peter Chin

TL;DR

The paper tackles learning in simultaneous-move, partial-information multi-agent games by injecting game-theoretic equilibrium reasoning into tree search. It proposes NN-CCE, which uses per-agent policies and MA-EXP-IX to approximate an $\epsilon$-CCE within a structured, time-layered MCTS framework trained via self-play. Across OpenSpiel, Google Football, SMAC, and related benchmarks, NN-CCE outperforms equilibrium-approximation baselines and strong MARL methods, with improved consistency and robustness, albeit at the cost of longer training times. The work advances practical equilibrium-aware planning for discrete-action, multi-agent settings and suggests paths toward continuous-action extensions and broader scalability.

Abstract

Neural network supported tree-search has shown strong results in a variety of perfect information multi-agent tasks. However, the performance of these methods on partial information games has generally been below competing approaches. Here we study the class of simultaneous-move games, which are a subclass of partial information games which are most similar to perfect information games: both agents know the game state with the exception of the opponent's move, which is revealed only after each agent makes its own move. Simultaneous move games include popular benchmarks such as Google Research Football and Starcraft. In this study we answer the question: can we take tree search algorithms trained through self-play from perfect information settings and adapt them to simultaneous move games without significant loss of performance? We answer this question by deriving a practical method that attempts to approximate a coarse correlated equilibrium as a subroutine within a tree search. Our algorithm works on cooperative, competitive, and mixed tasks. Our results are better than the current best MARL algorithms on a wide range of accepted baseline environments.

Tree Search for Simultaneous Move Games via Equilibrium Approximation

TL;DR

-CCE within a structured, time-layered MCTS framework trained via self-play. Across OpenSpiel, Google Football, SMAC, and related benchmarks, NN-CCE outperforms equilibrium-approximation baselines and strong MARL methods, with improved consistency and robustness, albeit at the cost of longer training times. The work advances practical equilibrium-aware planning for discrete-action, multi-agent settings and suggests paths toward continuous-action extensions and broader scalability.

Abstract

Paper Structure (28 sections, 4 equations, 1 figure, 3 tables, 4 algorithms)

This paper contains 28 sections, 4 equations, 1 figure, 3 tables, 4 algorithms.

Introduction
Background Information
Deep MARL Training
Challenges of MARL
Game Theory and online learning
Related Work
Methodology
Experimental Parameters
Points of Comparison
Environments
Compared Algorithms and Evaluation Metrics
Performance Results
Conclusion and Future Work
Appendix
Factors influential to NN-CCE Performance
...and 13 more sections

Figures (1)

Figure 1: Results on GFR. NN-CCE (ours) and MADDPG are trained via self-play, MAPPO is trained against a fixed algorithm opponent.

Tree Search for Simultaneous Move Games via Equilibrium Approximation

TL;DR

Abstract

Tree Search for Simultaneous Move Games via Equilibrium Approximation

Authors

TL;DR

Abstract

Table of Contents

Figures (1)