Multi-agent Reinforcement Learning: A Comprehensive Survey

Dom Huh; Prasant Mohapatra

Multi-agent Reinforcement Learning: A Comprehensive Survey

Dom Huh, Prasant Mohapatra

TL;DR

This survey analyzes MARL through the lens of game theory and machine learning, outlining how decentralized multi-agent environments, stochastic games, and various GT concepts shape learning dynamics. It provides a comprehensive taxonomy of MAS representations, learning paradigms (CTCE/DTDE/CTDE), credit assignment, communication, MOA, ad-hoc team-play, and social learning, while detailing core challenges like non-stationarity, scalability, and evaluation. The work emphasizes integrating GT insights with data-driven MARL methods to guide robust, scalable, and socially aware agent coordination, and highlights directions such as foundation models, open-source ecosystems, and advanced communication and MOA techniques. Overall, the paper offers a holistic framework for understanding MARL’s current state and paves the way for future, GT-informed ML advances in multi-agent control.

Abstract

Multi-agent systems (MAS) are widely prevalent and crucially important in numerous real-world applications, where multiple agents must make decisions to achieve their objectives in a shared environment. Despite their ubiquity, the development of intelligent decision-making agents in MAS poses several open challenges to their effective implementation. This survey examines these challenges, placing an emphasis on studying seminal concepts from game theory (GT) and machine learning (ML) and connecting them to recent advancements in multi-agent reinforcement learning (MARL), i.e. the research of data-driven decision-making within MAS. Therefore, the objective of this survey is to provide a comprehensive perspective along the various dimensions of MARL, shedding light on the unique opportunities that are presented in MARL applications while highlighting the inherent challenges that accompany this potential. Therefore, we hope that our work will not only contribute to the field by analyzing the current landscape of MARL but also motivate future directions with insights for deeper integration of concepts from related domains of GT and ML. With this in mind, this work delves into a detailed exploration of recent and past efforts of MARL and its related fields and describes prior solutions that were proposed and their limitations, as well as their applications.

Multi-agent Reinforcement Learning: A Comprehensive Survey

TL;DR

Abstract

Paper Structure (86 sections, 25 equations, 6 figures, 3 tables)

This paper contains 86 sections, 25 equations, 6 figures, 3 tables.

Introduction
Related Surveys
Background
Multi-agent Environment
Stochastic Game
Sequential and Macro-Actions
Imperfect Information
Reward Function
Nature of Interaction
Social Context
Networked Games
Coordination
Return
Value Function
MAS Objective
...and 71 more sections

Figures (6)

Figure 1: A visualization of a generalized multi-agent system following the iterative control process.
Figure 2: Models of Games: The overview of different models of multi-agent interactions is illustrated, from normal-form games to variations of stochastic games. We note that Markov Decision Processes (MDP) and POMDP are not models of a game but are included for a complete illustration. The following figure was adapted from Albrecht2024Book.
Figure 3: Policy Iteration: The process of policy iteration consists of an iterative cycle of policy evaluation (shown as $\xrightarrow{\text{E}}$) and policy improvement (shown as $\xrightarrow{\text{I}}$). Policy evaluation computes the value function for current policy whereas policy improvement updates current policy with respect to evaluated value function. The following figure was taken and modified from Sutton2018RL.
Figure 4: Visualization of relative overgeneralization in a two-player game from Wei2016lenient.
Figure 5: A general diagram of a holonic/multilevel simulation taken from tchappi2018brief.
...and 1 more figures

Theorems & Definitions (12)

Definition 1: Decentralization
Definition 2: Stochastic Game
Definition 3
Definition 4: Pareto Efficiency
Definition 5: Nash Equilibrium
Definition 6
Definition 7: Learning Pathology
Definition 8: Incompatiable Equilibria
Definition 9: Shadowed Equilibrium
Definition 10: Action Shadowing
...and 2 more

Multi-agent Reinforcement Learning: A Comprehensive Survey

TL;DR

Abstract

Multi-agent Reinforcement Learning: A Comprehensive Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (12)