Table of Contents
Fetching ...

Field Deployment of Multi-Agent Reinforcement Learning Based Variable Speed Limit Controllers

Yuhang Zhang, Zhiyao Zhang, Marcos Quiñones-Grueiro, William Barbour, Clay Weston, Gautam Biswas, Daniel Work

TL;DR

This work demonstrates the first field deployment of a MARL-based variable speed limit controller on a 17-mile section of I-24 near Nashville, trained in simulation and implemented across 67 gantries with AI-DSS integration to a Traffic Management Center. It combines invalid action masking and safety guards to satisfy real-world constraints while maintaining a high degree of autonomous policy operation, achieving up to 98% intervention-free decisions and over 10 million control actions across 8 million trips. The study also quantifies domain mismatch between simulation and reality using Wasserstein distance and shows the policy's robustness under real-world variability, highlighting the practical potential of scalable MARL for infrastructure control. These results support the viability of simulation-trained MARL policies for real-time traffic management and point to avenues for richer datasets, broader deployments, and deeper performance analyses against traditional safety and mobility metrics.

Abstract

This article presents the first field deployment of a multi-agent reinforcement-learning (MARL) based variable speed limit (VSL) control system on the I-24 freeway near Nashville, Tennessee. We describe how we train MARL agents in a traffic simulator and directly deploy the simulation-based policy on a 17-mile stretch of Interstate 24 with 67 VSL controllers. We use invalid action masking and several safety guards to ensure the posted speed limits satisfy the real-world constraints from the traffic management center and the Tennessee Department of Transportation. Since the time of launch of the system through April, 2024, the system has made approximately 10,000,000 decisions on 8,000,000 trips. The analysis of the controller shows that the MARL policy takes control for up to 98% of the time without intervention from safety guards. The time-space diagrams of traffic speed and control commands illustrate how the algorithm behaves during rush hour. Finally, we quantify the domain mismatch between the simulation and real-world data and demonstrate the robustness of the MARL policy to this mismatch.

Field Deployment of Multi-Agent Reinforcement Learning Based Variable Speed Limit Controllers

TL;DR

This work demonstrates the first field deployment of a MARL-based variable speed limit controller on a 17-mile section of I-24 near Nashville, trained in simulation and implemented across 67 gantries with AI-DSS integration to a Traffic Management Center. It combines invalid action masking and safety guards to satisfy real-world constraints while maintaining a high degree of autonomous policy operation, achieving up to 98% intervention-free decisions and over 10 million control actions across 8 million trips. The study also quantifies domain mismatch between simulation and reality using Wasserstein distance and shows the policy's robustness under real-world variability, highlighting the practical potential of scalable MARL for infrastructure control. These results support the viability of simulation-trained MARL policies for real-time traffic management and point to avenues for richer datasets, broader deployments, and deeper performance analyses against traditional safety and mobility metrics.

Abstract

This article presents the first field deployment of a multi-agent reinforcement-learning (MARL) based variable speed limit (VSL) control system on the I-24 freeway near Nashville, Tennessee. We describe how we train MARL agents in a traffic simulator and directly deploy the simulation-based policy on a 17-mile stretch of Interstate 24 with 67 VSL controllers. We use invalid action masking and several safety guards to ensure the posted speed limits satisfy the real-world constraints from the traffic management center and the Tennessee Department of Transportation. Since the time of launch of the system through April, 2024, the system has made approximately 10,000,000 decisions on 8,000,000 trips. The analysis of the controller shows that the MARL policy takes control for up to 98% of the time without intervention from safety guards. The time-space diagrams of traffic speed and control commands illustrate how the algorithm behaves during rush hour. Finally, we quantify the domain mismatch between the simulation and real-world data and demonstrate the robustness of the MARL policy to this mismatch.
Paper Structure (22 sections, 6 equations, 7 figures, 1 table)

This paper contains 22 sections, 6 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: The MARL-based VSL control system on I-24 Westbound: This figure shows a consecutive four gantries from a driver's perspective when approaching a congestion tail. As drivers proceed, they encounter progressively reduced speed limits of 60, 50, 40, and 30 mph displayed on each gantry, sequentially alerting them to the upcoming slow-down pattern.
  • Figure 2: Deployment pipeline of our MARL-based VSL: Step 1: We trained 8 agents in TransModeler on a 7-mile stretch of I-24 and then tested it with 34 agents on a 17-mile stretch of westbound I-24 with various simulation parameters. Step 2: We extracted the optimal policy learned from simulation and applied invalid action masking and safety guards to satisfy real-world constraints. Step 3: We tested the behavior of the proposed MARL-based VSL control algorithm in an open-loop manner, with continuous streaming of I-24 sensor data feeding into Artificial-Intelligence Decision Support System (AI-DSS), the infrastructure software served for communication with Traffic Management Center (TMC). Step 4: We deployed the MARL-based VSL control algorithm in a closed-loop manner across 67 VSL gantries spanning a 17-mile stretch of I-24 with nearly 160,000 daily commuters on March 8, 2024.
  • Figure 3: The deployed VSL control algorithm, centered around a MARL policy, considers all real-world constraints. IAM represents "Invalid Action Masking" and SM represents "Speed-Matching".
  • Figure 4: Overview of the VSL deployment segment of both directions on I-24 SMART Corridor. The left direction is going downtown Nashville and right is going to Murfreesboro. RDS denotes Radar Detection System, which is the traffic sensor installed on I-24.
  • Figure 5: The MARL-based VSL control algorithm's behavior from a random morning peak hour (Monday, April 22, 2024) on I-24 Westbound: (a) displays the time-space diagram of average traffic speed recorded by roadside RDS sensor in every 30 seconds. With x-axis representing time and y-axis representing mile markers, the traffic direction is going upward along y-axis to Nashville. Three virtual vehicles are simulated starting from 6am, 7am and 8am, according to the RDS speed data, and their trajectories are overlayed on the figure. Controlling at every 30-second interval, (b) presents the time-space diagram of the 34 VSL gantries controlled by the MARL-based algorithm in this study. (c) shows the same diagram as (b) but with safety guards overrides masked as white. (d) details the time series of the travel speed and the encountering speed limits of each virtual vehicle generated in (a).
  • ...and 2 more figures