Table of Contents
Fetching ...

Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment

Leizhen Wang, Peibo Duan, Cheng Lyu, Zewen Wang, Zhiqiang He, Nan Zheng, Zhenliang Ma

TL;DR

This work tackles scalability and reliability limits in multi-agent reinforcement learning for traffic assignment by redefining agents as origin-destination (OD) pair routers and introducing a Dirichlet-based action space with action pruning. The methodology uses local relative-gap rewards within a DEC-POMDP framework to guide learning, yielding strong performance on medium-sized networks and substantial efficiency gains over conventional methods. Empirical results show improved convergence, reduced agent count, and near-optimal solutions within limited step budgets, enabling potential real-time adaptive routing applications. The approach offers a practical path toward scalable, reliable MARL-based traffic management in real-world urban networks, with future work extending to high-fidelity simulators and larger-scale deployments.

Abstract

The evolution of metropolitan cities and the increase in travel demands impose stringent requirements on traffic assignment methods. Multi-agent reinforcement learning (MARL) approaches outperform traditional methods in modeling adaptive routing behavior without requiring explicit system dynamics, which is beneficial for real-world deployment. However, MARL frameworks face challenges in scalability and reliability when managing extensive networks with substantial travel demand, which limiting their practical applicability in solving large-scale traffic assignment problems. To address these challenges, this study introduces MARL-OD-DA, a new MARL framework for the traffic assignment problem, which redefines agents as origin-destination (OD) pair routers rather than individual travelers, significantly enhancing scalability. Additionally, a Dirichlet-based action space with action pruning and a reward function based on the local relative gap are designed to enhance solution reliability and improve convergence efficiency. Experiments demonstrate that the proposed MARL framework effectively handles medium-sized networks with extensive and varied city-level OD demand, surpassing existing MARL methods. When implemented in the SiouxFalls network, MARL-OD-DA achieves better assignment solutions in 10 steps, with a relative gap that is 94.99% lower than that of conventional methods.

Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment

TL;DR

This work tackles scalability and reliability limits in multi-agent reinforcement learning for traffic assignment by redefining agents as origin-destination (OD) pair routers and introducing a Dirichlet-based action space with action pruning. The methodology uses local relative-gap rewards within a DEC-POMDP framework to guide learning, yielding strong performance on medium-sized networks and substantial efficiency gains over conventional methods. Empirical results show improved convergence, reduced agent count, and near-optimal solutions within limited step budgets, enabling potential real-time adaptive routing applications. The approach offers a practical path toward scalable, reliable MARL-based traffic management in real-world urban networks, with future work extending to high-fidelity simulators and larger-scale deployments.

Abstract

The evolution of metropolitan cities and the increase in travel demands impose stringent requirements on traffic assignment methods. Multi-agent reinforcement learning (MARL) approaches outperform traditional methods in modeling adaptive routing behavior without requiring explicit system dynamics, which is beneficial for real-world deployment. However, MARL frameworks face challenges in scalability and reliability when managing extensive networks with substantial travel demand, which limiting their practical applicability in solving large-scale traffic assignment problems. To address these challenges, this study introduces MARL-OD-DA, a new MARL framework for the traffic assignment problem, which redefines agents as origin-destination (OD) pair routers rather than individual travelers, significantly enhancing scalability. Additionally, a Dirichlet-based action space with action pruning and a reward function based on the local relative gap are designed to enhance solution reliability and improve convergence efficiency. Experiments demonstrate that the proposed MARL framework effectively handles medium-sized networks with extensive and varied city-level OD demand, surpassing existing MARL methods. When implemented in the SiouxFalls network, MARL-OD-DA achieves better assignment solutions in 10 steps, with a relative gap that is 94.99% lower than that of conventional methods.

Paper Structure

This paper contains 23 sections, 14 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: DEC-POMDP framework for traffic assignment
  • Figure 2: Softmax-based and Dirichlet-based strategy (GD: Gaussian Distribution)
  • Figure 3: Urban transportation networks and corresponding agent numbers
  • Figure 4: Training curves across three transportation networks
  • Figure 5: Performance comparison of trained agents and conventional methods across three transportation networks