Table of Contents
Fetching ...

Multi-Agent Reinforcement Learning in Intelligent Transportation Systems: A Comprehensive Survey

Rexcharles Donatus, Kumater Ter, Daniel Udekwe

TL;DR

A structured taxonomy is introduced that categorizes MARL approaches according to coordination models and learning algorithms, spanning value based, policy based, actor critic, and communication enhanced frameworks, as well as identifying core challenges, including scalability, non stationarity, credit assignment, communication constraints, and the sim to real transfer gap.

Abstract

The growing complexity of urban mobility and the demand for efficient, sustainable, and adaptive solutions have positioned Intelligent Transportation Systems (ITS) at the forefront of modern infrastructure innovation. At the core of ITS lies the challenge of autonomous decision-making across dynamic, large scale, and uncertain environments where multiple agents traffic signals, autonomous vehicles, or fleet units must coordinate effectively. Multi Agent Reinforcement Learning (MARL) offers a promising paradigm for addressing these challenges by enabling distributed agents to jointly learn optimal strategies that balance individual objectives with system wide efficiency. This paper presents a comprehensive survey of MARL applications in ITS. We introduce a structured taxonomy that categorizes MARL approaches according to coordination models and learning algorithms, spanning value based, policy based, actor critic, and communication enhanced frameworks. Applications are reviewed across key ITS domains, including traffic signal control, connected and autonomous vehicle coordination, logistics optimization, and mobility on demand systems. Furthermore, we highlight widely used simulation platforms such as SUMO, CARLA, and CityFlow that support MARL experimentation, along with emerging benchmarks. The survey also identifies core challenges, including scalability, non stationarity, credit assignment, communication constraints, and the sim to real transfer gap, which continue to hinder real world deployment.

Multi-Agent Reinforcement Learning in Intelligent Transportation Systems: A Comprehensive Survey

TL;DR

A structured taxonomy is introduced that categorizes MARL approaches according to coordination models and learning algorithms, spanning value based, policy based, actor critic, and communication enhanced frameworks, as well as identifying core challenges, including scalability, non stationarity, credit assignment, communication constraints, and the sim to real transfer gap.

Abstract

The growing complexity of urban mobility and the demand for efficient, sustainable, and adaptive solutions have positioned Intelligent Transportation Systems (ITS) at the forefront of modern infrastructure innovation. At the core of ITS lies the challenge of autonomous decision-making across dynamic, large scale, and uncertain environments where multiple agents traffic signals, autonomous vehicles, or fleet units must coordinate effectively. Multi Agent Reinforcement Learning (MARL) offers a promising paradigm for addressing these challenges by enabling distributed agents to jointly learn optimal strategies that balance individual objectives with system wide efficiency. This paper presents a comprehensive survey of MARL applications in ITS. We introduce a structured taxonomy that categorizes MARL approaches according to coordination models and learning algorithms, spanning value based, policy based, actor critic, and communication enhanced frameworks. Applications are reviewed across key ITS domains, including traffic signal control, connected and autonomous vehicle coordination, logistics optimization, and mobility on demand systems. Furthermore, we highlight widely used simulation platforms such as SUMO, CARLA, and CityFlow that support MARL experimentation, along with emerging benchmarks. The survey also identifies core challenges, including scalability, non stationarity, credit assignment, communication constraints, and the sim to real transfer gap, which continue to hinder real world deployment.

Paper Structure

This paper contains 30 sections, 48 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: Illustration of the reinforcement learning loop: the agent interacts with the environment by taking actions $a_t$, and in return receives the next state $s_{t+1}$ and reward $r_t$, forming a continuous feedback cycle for learning optimal behavior.
  • Figure 2: Hierarchy of Reinforcement Learning Methods: Categorizing Approaches into Model-Free and Model-Based
  • Figure 3: Illustration of the Actor-Critic Reinforcement Learning Framework
  • Figure 4: Multi-agent reinforcement learning architectures: (a) Decentralized Training and Decentralized Execution (DTDE) agents learn independently with local observations and rewards; (b) Centralized Training with Centralized Execution (CTCE) agents are trained and executed using shared global information; (c) Centralized Training with Decentralized Execution (CTDE) agents are trained with global information but execute using only local observations.
  • Figure 5: Illustration of the Value Decomposition Network (VDN) framework for multi-agent reinforcement learning. Each agent receives its own local observation ($o^1_t$, $o^2_t$) and uses an individual Q-network (Q-Net) to estimate its action-value function. Based on these estimates, actions ($a^1_t$, $a^2_t$) are selected. The corresponding Q-values are then aggregated (summed) to compute the joint action-value function $q_t$, which is used for centralized training.
  • ...and 7 more figures