Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment

Leizhen Wang; Peibo Duan; Cheng Lyu; Zewen Wang; Zhiqiang He; Nan Zheng; Zhenliang Ma

Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment

Leizhen Wang, Peibo Duan, Cheng Lyu, Zewen Wang, Zhiqiang He, Nan Zheng, Zhenliang Ma

TL;DR

This work tackles scalability and reliability limits in multi-agent reinforcement learning for traffic assignment by redefining agents as origin-destination (OD) pair routers and introducing a Dirichlet-based action space with action pruning. The methodology uses local relative-gap rewards within a DEC-POMDP framework to guide learning, yielding strong performance on medium-sized networks and substantial efficiency gains over conventional methods. Empirical results show improved convergence, reduced agent count, and near-optimal solutions within limited step budgets, enabling potential real-time adaptive routing applications. The approach offers a practical path toward scalable, reliable MARL-based traffic management in real-world urban networks, with future work extending to high-fidelity simulators and larger-scale deployments.

Abstract

The evolution of metropolitan cities and the increase in travel demands impose stringent requirements on traffic assignment methods. Multi-agent reinforcement learning (MARL) approaches outperform traditional methods in modeling adaptive routing behavior without requiring explicit system dynamics, which is beneficial for real-world deployment. However, MARL frameworks face challenges in scalability and reliability when managing extensive networks with substantial travel demand, which limiting their practical applicability in solving large-scale traffic assignment problems. To address these challenges, this study introduces MARL-OD-DA, a new MARL framework for the traffic assignment problem, which redefines agents as origin-destination (OD) pair routers rather than individual travelers, significantly enhancing scalability. Additionally, a Dirichlet-based action space with action pruning and a reward function based on the local relative gap are designed to enhance solution reliability and improve convergence efficiency. Experiments demonstrate that the proposed MARL framework effectively handles medium-sized networks with extensive and varied city-level OD demand, surpassing existing MARL methods. When implemented in the SiouxFalls network, MARL-OD-DA achieves better assignment solutions in 10 steps, with a relative gap that is 94.99% lower than that of conventional methods.

Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment

TL;DR

Abstract

Scalable and Reliable Multi-agent Reinforcement Learning for Traffic Assignment

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)