Table of Contents
Fetching ...

Game-Theoretic Multiagent Reinforcement Learning

Yaodong Yang, Chengdong Ma, Zihan Ding, Stephen McAleer, Chi Jin, Jun Wang, Tuomas Sandholm

TL;DR

The paper surveys the game-theoretic foundations and modern advances of multiagent reinforcement learning, connecting classic equilibrium concepts with contemporary algorithmic frameworks. It covers single-agent RL as a prelude, then extends to stochastic and extensive-form games, partially observable settings, and mean-field regimes to address scalability. Key contributions include a comprehensive taxonomy of MARL algorithms, rigorous treatment of equilibrium concepts (NE, CE, CCE), and detailed discussions of challenges such as non-stationarity, combinatorial complexity, and learning in large populations. The work highlights future directions across theory, safety, model-based approaches, meta-learning, and the integration of foundation models to push MARL toward robust, scalable real-world deployments.

Abstract

Tremendous advances have been made in multiagent reinforcement learning (MARL). MARL corresponds to the learning problem in a multiagent system in which multiple agents learn simultaneously. It is an interdisciplinary field of study with a long history that includes game theory, machine learning, stochastic control, psychology, and optimization. Despite great successes in MARL, there is a lack of a self-contained overview of the literature that covers game-theoretic foundations of modern MARL methods and summarizes the recent advances. The majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments on the research frontier. The goal of this monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game-theoretic perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing field and experts in the field who want to obtain a panoramic view and identify new directions based on recent advances.

Game-Theoretic Multiagent Reinforcement Learning

TL;DR

The paper surveys the game-theoretic foundations and modern advances of multiagent reinforcement learning, connecting classic equilibrium concepts with contemporary algorithmic frameworks. It covers single-agent RL as a prelude, then extends to stochastic and extensive-form games, partially observable settings, and mean-field regimes to address scalability. Key contributions include a comprehensive taxonomy of MARL algorithms, rigorous treatment of equilibrium concepts (NE, CE, CCE), and detailed discussions of challenges such as non-stationarity, combinatorial complexity, and learning in large populations. The work highlights future directions across theory, safety, model-based approaches, meta-learning, and the integration of foundation models to push MARL toward robust, scalable real-world deployments.

Abstract

Tremendous advances have been made in multiagent reinforcement learning (MARL). MARL corresponds to the learning problem in a multiagent system in which multiple agents learn simultaneously. It is an interdisciplinary field of study with a long history that includes game theory, machine learning, stochastic control, psychology, and optimization. Despite great successes in MARL, there is a lack of a self-contained overview of the literature that covers game-theoretic foundations of modern MARL methods and summarizes the recent advances. The majority of existing surveys are outdated and do not fully cover the recent developments since 2010. In this work, we provide a monograph on MARL that covers both the fundamentals and the latest developments on the research frontier. The goal of this monograph is to provide a self-contained assessment of the current state-of-the-art MARL techniques from a game-theoretic perspective. We expect this work to serve as a stepping stone for both new researchers who are about to enter this fast-growing field and experts in the field who want to obtain a panoramic view and identify new directions based on recent advances.

Paper Structure

This paper contains 86 sections, 1 theorem, 107 equations, 8 figures, 4 tables, 1 algorithm.

Key Result

Theorem 8.1

For general-sum SGs, {NE}$\subseteq${CE}$\subseteq${CCE}.

Figures (8)

  • Figure 1: Diagram of a single-agent MDP (left) and a multiagent MDP (right).
  • Figure 2: A snapshot of stochastic time in the intersection example. The scenario is abstracted such that there are two cars, with each car taking one of two possible actions: to yield or to rush. The outcome of each joint action pair is represented by a normal-form game, with the reward value for the row player denoted in red and that for the column player denoted in black. The Nash equilibria (NE) of this game are (rush, yield) and (yield, rush). If both cars maximize their own reward selfishly without considering the others, they will end up in an accident.
  • Figure 3: The landscape of different complexity classes. Relevant examples are 1) solving the NE in a two-player zero-sum game, $P$-complete neumann1928theorie, 2) solving the NE in a general-sum game, $PPAD$-hard daskalakis2009complexity, 3) checking the uniqueness of the NE, $NP$-hard conitzer2008new, 4) checking whether a pure-strategy NE exists in a stochastic game, $PSPACE$-hard conitzer2008new, and 5) solving Dec-POMDP, $NEXPTIME$-hard bernstein2002complexity.
  • Figure 4: Venn diagram of different types of games in the context of POSGs. The intersection of SG and Dec-POMDP is the team game. In the upper-half SG, we have MDP $\subset$ team games $\subset$ potential games $\subset$ identical-interest games $\subset$ SGs. In the bottom-half Dec-POMDP, we have MDP $\subset$ team games $\subset$ Dec-MDP $\subset$ Dec-POMDPs, and MDP $\subset$ POMDP $\subset$ Dec-POMDP. We refer to Sections (\ref{['sec:st-sg']} & \ref{['sec:st-pomdp']}) for detailed definitions of these games.
  • Figure 5: Game tree of two-player Kuhn poker. Each node (i.e., circles, squares and rectangles) represents the choice of one player, each edge represents a possible action, and the leaves (i.e., diamond) represent final outcomes over which each player has a reward function (only player one's reward is shown in the graph since Kuhn poker is a zero-sum game). Each player can observe only their own card; for example, when player one holds a Jack, it cannot tell whether player two is holding a Queen or a King, so the choice nodes of player one in each of the two scenarios stay within the same information set.
  • ...and 3 more figures

Theorems & Definitions (14)

  • Definition 2.1: Markov Decision Process
  • Definition 3.1: Stochastic Game
  • Definition 3.2: Nash Equilibrium for Stochastic Game
  • Definition 3.3: Special Types of Stochastic Games
  • Definition 3.4: partially-observable stochastic games
  • Definition 3.5: Dec-POMDP
  • Definition 3.6: Special types of Dec-POMDPs
  • Definition 3.7: Extensive-form Game
  • Definition 3.8: Sequence-form Representation
  • Definition 7.1
  • ...and 4 more