Field Deployment of Multi-Agent Reinforcement Learning Based Variable Speed Limit Controllers
Yuhang Zhang, Zhiyao Zhang, Marcos Quiñones-Grueiro, William Barbour, Clay Weston, Gautam Biswas, Daniel Work
TL;DR
This work demonstrates the first field deployment of a MARL-based variable speed limit controller on a 17-mile section of I-24 near Nashville, trained in simulation and implemented across 67 gantries with AI-DSS integration to a Traffic Management Center. It combines invalid action masking and safety guards to satisfy real-world constraints while maintaining a high degree of autonomous policy operation, achieving up to 98% intervention-free decisions and over 10 million control actions across 8 million trips. The study also quantifies domain mismatch between simulation and reality using Wasserstein distance and shows the policy's robustness under real-world variability, highlighting the practical potential of scalable MARL for infrastructure control. These results support the viability of simulation-trained MARL policies for real-time traffic management and point to avenues for richer datasets, broader deployments, and deeper performance analyses against traditional safety and mobility metrics.
Abstract
This article presents the first field deployment of a multi-agent reinforcement-learning (MARL) based variable speed limit (VSL) control system on the I-24 freeway near Nashville, Tennessee. We describe how we train MARL agents in a traffic simulator and directly deploy the simulation-based policy on a 17-mile stretch of Interstate 24 with 67 VSL controllers. We use invalid action masking and several safety guards to ensure the posted speed limits satisfy the real-world constraints from the traffic management center and the Tennessee Department of Transportation. Since the time of launch of the system through April, 2024, the system has made approximately 10,000,000 decisions on 8,000,000 trips. The analysis of the controller shows that the MARL policy takes control for up to 98% of the time without intervention from safety guards. The time-space diagrams of traffic speed and control commands illustrate how the algorithm behaves during rush hour. Finally, we quantify the domain mismatch between the simulation and real-world data and demonstrate the robustness of the MARL policy to this mismatch.
