A Constrained Multi-Agent Reinforcement Learning Approach to Autonomous Traffic Signal Control
Anirudh Satheesh, Keenan Powell
TL;DR
This work reframes Adaptive Traffic Signal Control as a constrained multi-agent reinforcement learning problem to simultaneously optimize traffic throughput and satisfy real-world constraints. It introduces MAPPO-LCE, a MAPPO-based algorithm enhanced with a Lagrange multiplier framework and a trainable cost estimator to enforce GreenTime, GreenSkip, and PhaseSkip constraints across intersections. Empirical results in CityFlow with Hangzhou, Jinan, and New York datasets show MAPPO-LCE consistently outperforms IPPO, MAPPO, and QTRAN in reward, throughput, and delay metrics while reducing constraint violations, especially in more complex networks. The approach demonstrates the feasibility and potential impact of constrained MARL for scalable, real-world ATSC deployment, with open-source code provided for replication.
Abstract
Traffic congestion in modern cities is exacerbated by the limitations of traditional fixed-time traffic signal systems, which fail to adapt to dynamic traffic patterns. Adaptive Traffic Signal Control (ATSC) algorithms have emerged as a solution by dynamically adjusting signal timing based on real-time traffic conditions. However, the main limitation of such methods is that they are not transferable to environments under real-world constraints, such as balancing efficiency, minimizing collisions, and ensuring fairness across intersections. In this paper, we view the ATSC problem as a constrained multi-agent reinforcement learning (MARL) problem and propose a novel algorithm named Multi-Agent Proximal Policy Optimization with Lagrange Cost Estimator (MAPPO-LCE) to produce effective traffic signal control policies. Our approach integrates the Lagrange multipliers method to balance rewards and constraints, with a cost estimator for stable adjustment. We also introduce three constraints on the traffic network: GreenTime, GreenSkip, and PhaseSkip, which penalize traffic policies that do not conform to real-world scenarios. Our experimental results on three real-world datasets demonstrate that MAPPO-LCE outperforms three baseline MARL algorithms by across all environments and traffic constraints (improving on MAPPO by 12.60%, IPPO by 10.29%, and QTRAN by 13.10%). Our results show that constrained MARL is a valuable tool for traffic planners to deploy scalable and efficient ATSC methods in real-world traffic networks. We provide code at https://github.com/Asatheesh6561/MAPPO-LCE.
