Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

Ruiqi Zhang; Jing Hou; Florian Walter; Shangding Gu; Jiayi Guan; Florian Röhrbein; Yali Du; Panpan Cai; Guang Chen; Alois Knoll

Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

Ruiqi Zhang, Jing Hou, Florian Walter, Shangding Gu, Jiayi Guan, Florian Röhrbein, Yali Du, Panpan Cai, Guang Chen, Alois Knoll

TL;DR

This survey addresses MARL for autonomous driving by framing the problem within multi-agent decision making and detailing the MARL ecosystem, including benchmarks, simulators, datasets, and competitions. It presents a taxonomy of learning schemes (CTDE vs DTDE) and highlights core methodologies (centralized MARL, independent policies, social preferences, and safety-focused learning) with representative algorithms like MADDPG, QMIX, MAPPO, PRIMAL, and CoPO, framed within the MDP/Dec-POMDP formalism. The authors identify key challenges such as multi-modal fusion, robustness, sim-to-real transfer, safety verification, and explainability, and propose future directions including model-based MARL, offline data collection, human-in-the-loop learning, and the use of language models to augment driving systems. By compiling benchmarks, simulator reviews, and methodological insights, the work provides a practical roadmap for researchers to design, test, and deploy MARL-driven autonomous driving technologies, while also offering a GitHub repository to track updates. $\mathcal{S}$, $\mathcal{A}$, $\mathcal{R}$, $\mathcal{T}$, and $\gamma$ are used to denote the standard MDP components throughout the discussion.

Abstract

Reinforcement Learning (RL) is a potent tool for sequential decision-making and has achieved performance surpassing human capabilities across many challenging real-world tasks. As the extension of RL in the multi-agent system domain, multi-agent RL (MARL) not only need to learn the control policy but also requires consideration regarding interactions with all other agents in the environment, mutual influences among different system components, and the distribution of computational resources. This augments the complexity of algorithmic design and poses higher requirements on computational resources. Simultaneously, simulators are crucial to obtain realistic data, which is the fundamentals of RL. In this paper, we first propose a series of metrics of simulators and summarize the features of existing benchmarks. Second, to ease comprehension, we recall the foundational knowledge and then synthesize the recently advanced studies of MARL-related autonomous driving and intelligent transportation systems. Specifically, we examine their environmental modeling, state representation, perception units, and algorithm design. Conclusively, we discuss open challenges as well as prospects and opportunities. We hope this paper can help the researchers integrate MARL technologies and trigger more insightful ideas toward the intelligent and autonomous driving.

Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

TL;DR

, and

are used to denote the standard MDP components throughout the discussion.

Abstract

Paper Structure (53 sections, 15 equations, 6 figures, 2 tables)

This paper contains 53 sections, 15 equations, 6 figures, 2 tables.

Introduction
Autonomous Driving Benchmarks
What is important for a good benchmark?
Realism and Fidelity
Scalability
Diversity
Efficiency
Transferability
Features, Maintenance and Supports
Advanced Simulators
The Open Racing Car Simulator
Simulation of Urban Mobility
Scalable Multi-Agent Reinforcement Learning Training School
MetaDrive
CAR Learning to Act
...and 38 more sections

Figures (6)

Figure 1: The number of keywords multi-agent reinforcement learning, autonomous driving and intelligent transportation systems publications from 2015 to 2023 (from Dimension AIdimensions). These three research topics are in rapid development and obtaining increasing attention from academia.
Figure 2: Timeline of the evolution and representative studies of autonomous driving, deep learning and RL. Based on existing hardware, early research was conducted with a hierarchical scheme, i.e. perception-planning-control. Around 2014, with the rapid development of deep learning and the emergence of datasets, data-driven methods became the mainstream for a time. From 2015 to 2019, numerous RL algorithms appeared, and people realized the opportunities of end-to-end control through simulators. In 2017, MADDPGmaddpg2017 introduced single-agent RL into multi-agent systems, leading to massive subsequent research on how to control large-scale autonomous vehicles through MARL. In 2020, ChatGPT was launched and made language models receive massive attention.
Figure 3: The Centralized Training-Decentralized Execution (CTDE) scheme versus Decentralized Training and Execution (DTDE) scheme. In CTDE paradigm, it utilizes a central controller to concatenate all observations and distribute policies for all agents. In the DTDE paradigm, the agent computes the policy itself. According to hardware settings, they can communicate with each other and exchange information, or detect others' states by sensors.
Figure 4: Four examples at free intersection to show how social preference effects the behaviors of AVs. In (a) and (c), the combination (i.e. one egoistic + one altruistic) is healthy; in (b), two altruistic cars would wait for each other; in (d), two egoistic cars would crash into each other.
Figure 5: The architecture of Coordinate Policy Optimizationcopo. The bi-level training process enables to balance egoism and altruism.
...and 1 more figures

Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

TL;DR

Abstract

Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (6)