Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey
Ruiqi Zhang, Jing Hou, Florian Walter, Shangding Gu, Jiayi Guan, Florian Röhrbein, Yali Du, Panpan Cai, Guang Chen, Alois Knoll
TL;DR
This survey addresses MARL for autonomous driving by framing the problem within multi-agent decision making and detailing the MARL ecosystem, including benchmarks, simulators, datasets, and competitions. It presents a taxonomy of learning schemes (CTDE vs DTDE) and highlights core methodologies (centralized MARL, independent policies, social preferences, and safety-focused learning) with representative algorithms like MADDPG, QMIX, MAPPO, PRIMAL, and CoPO, framed within the MDP/Dec-POMDP formalism. The authors identify key challenges such as multi-modal fusion, robustness, sim-to-real transfer, safety verification, and explainability, and propose future directions including model-based MARL, offline data collection, human-in-the-loop learning, and the use of language models to augment driving systems. By compiling benchmarks, simulator reviews, and methodological insights, the work provides a practical roadmap for researchers to design, test, and deploy MARL-driven autonomous driving technologies, while also offering a GitHub repository to track updates. $\mathcal{S}$, $\mathcal{A}$, $\mathcal{R}$, $\mathcal{T}$, and $\gamma$ are used to denote the standard MDP components throughout the discussion.
Abstract
Reinforcement Learning (RL) is a potent tool for sequential decision-making and has achieved performance surpassing human capabilities across many challenging real-world tasks. As the extension of RL in the multi-agent system domain, multi-agent RL (MARL) not only need to learn the control policy but also requires consideration regarding interactions with all other agents in the environment, mutual influences among different system components, and the distribution of computational resources. This augments the complexity of algorithmic design and poses higher requirements on computational resources. Simultaneously, simulators are crucial to obtain realistic data, which is the fundamentals of RL. In this paper, we first propose a series of metrics of simulators and summarize the features of existing benchmarks. Second, to ease comprehension, we recall the foundational knowledge and then synthesize the recently advanced studies of MARL-related autonomous driving and intelligent transportation systems. Specifically, we examine their environmental modeling, state representation, perception units, and algorithm design. Conclusively, we discuss open challenges as well as prospects and opportunities. We hope this paper can help the researchers integrate MARL technologies and trigger more insightful ideas toward the intelligent and autonomous driving.
