Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational test

Kathy Jang; Nathan Lichtlé; Eugene Vinitsky; Adit Shah; Matthew Bunting; Matthew Nice; Benedetto Piccoli; Benjamin Seibold; Daniel B. Work; Maria Laura Delle Monache; Jonathan Sprinkle; Jonathan W. Lee; Alexandre M. Bayen

Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational test

Kathy Jang, Nathan Lichtlé, Eugene Vinitsky, Adit Shah, Matthew Bunting, Matthew Nice, Benedetto Piccoli, Benjamin Seibold, Daniel B. Work, Maria Laura Delle Monache, Jonathan Sprinkle, Jonathan W. Lee, Alexandre M. Bayen

TL;DR

This work investigates reinforcement learning controllers for traffic flow smoothing in the MegaVanderTest, the largest field deployment of automated vehicles to date, encompassing about 100 AVs. It develops two RL controller families (acceleration-based and ACC-based) trained in a data-driven, single-lane simulator that mirrors I-24 traffic dynamics and IDM-based human drivers, and then migrates them to real vehicles via ROS/ONNX interfaces. The paper presents extensive simulation results and a full field test, demonstrating notable improvements in fuel economy and throughput as well as effective dampening of stop-and-go waves, with the deployed system achieving meaningful flow smoothing in congested conditions. Contributions include a scalable cloud-enabled FLOW framework, detailed problem formulations for both controller types, a data-driven training pipeline from real highway trajectories, and a rigorous hardware-validation workflow that bridges simulation to real-world autonomous vehicle control. The findings underscore the practical viability of RL for mixed-autonomy traffic management and highlight design choices that influence safety, generalization, and real-world deployability.

Abstract

In this article, we explore the technical details of the reinforcement learning (RL) algorithms that were deployed in the largest field test of automated vehicles designed to smooth traffic flow in history as of 2023, uncovering the challenges and breakthroughs that come with developing RL controllers for automated vehicles. We delve into the fundamental concepts behind RL algorithms and their application in the context of self-driving cars, discussing the developmental process from simulation to deployment in detail, from designing simulators to reward function shaping. We present the results in both simulation and deployment, discussing the flow-smoothing benefits of the RL controller. From understanding the basics of Markov decision processes to exploring advanced techniques such as deep RL, our article offers a comprehensive overview and deep dive of the theoretical foundations and practical implementations driving this rapidly evolving field. We also showcase real-world case studies and alternative research projects that highlight the impact of RL controllers in revolutionizing autonomous driving. From tackling complex urban environments to dealing with unpredictable traffic scenarios, these intelligent controllers are pushing the boundaries of what automated vehicles can achieve. Furthermore, we examine the safety considerations and hardware-focused technical details surrounding deployment of RL controllers into automated vehicles. As these algorithms learn and evolve through interactions with the environment, ensuring their behavior aligns with safety standards becomes crucial. We explore the methodologies and frameworks being developed to address these challenges, emphasizing the importance of building reliable control systems for automated vehicles.

Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational test

TL;DR

Abstract

Paper Structure (48 sections, 14 equations, 23 figures, 4 tables)

This paper contains 48 sections, 14 equations, 23 figures, 4 tables.

Introduction
Background
Reinforcement Learning
Human Driver Models
MegaController / Speed Planner
By Kathy Jang and Nathan Lichtlé
by Kathy Jang and Nathan Lichtlé
by Kathy Jang and Nathan Lichtlé
Features of FLOW
Traffic Control using FLOW
by Nathan Lichtlé
by Nathan Lichtlé
Simulation / Problem Formulation
by Nathan Lichtlé and Kathy Jang
Data Acquisition
...and 33 more sections

Figures (23)

Figure 1: Example setup of the trajectory simulator for evaluation. One vehicle, which we name trajectory leader, replays a velocity trajectory from the I-24 Trajectory Dataset nice2021dataset. Following it are a combination of IDM-controlled human vehicles and automated vehicles (AVs) all on a single lane. This evaluation setup contains 8 platoons, each consisting of one AV followed by 24 human vehicles. The human vehicles are used to assess the smoothing performances of the AV. During training, only one platoon is simulated.
Figure S: Diagram summarizing the design of the acceleration-based controller. The environment is in state $s_t$, from which we obtain a vector of observations $o_t$ which is fed into the neural network to get a raw acceleration $a_t^\text{raw}$. This acceleration is wrapped by a gap-closing and a failsafe term if necessary, the resulting acceleration $a_t$ is applied to the AV and the whole simulation is updated. This leads to a new state $s_{t+1}$, and the process is repeated. Additionally, the environment computes a reward $r_t$ from the action, which is used to optimize the neural network.
Figure S: Trajectory leader vehicle replaying a trajectory from the dataset, followed by 10 platoons each composed of 1 AV and 19 IDM human vehicles, which corresponds to 200 vehicles (not including the trajectory leader) and a 5% AV penetration rate. The plot displays the speed of the trajectory leader as well as the first 4 AVs in the platoon. The remaining ones are omitted for visibility but follow a similar trend.
Figure S: Trajectory leader vehicle replaying a trajectory from the dataset (displayed in Figure \ref{['fig:accel_result1']}), followed by 10 platoons each composed of 1 AV and 19 IDM human vehicles, which corresponds to 200 vehicles (not including the trajectory leader) and a 5% AV penetration rate. We plot metrics aggregated over the whole simulation as a function of vehicle ID, 0 being the trajectory leader, 200 the last vehicle in the platoon, and $1 + 20k$ for $k \in \{ 0, 1, \dots, 9\}$ are AVs (indicated by vertical red lines). From top to bottom, the aggregated metrics are speed variance, speed average, miles-per-gallon average and space gap average. Large spikes correspond to the start of a platoon.
Figure S: Trajectory leader vehicle replaying a trajectory from the dataset (displayed in Figure \ref{['fig:accel_result1']}), followed by 10 platoons each composed of 1 AV and 19 IDM human vehicles, which corresponds to 200 vehicles (not including the trajectory leader) and a 5% AV penetration rate. We plot a time-space diagram when AVs are IDM-controlled (top) and when they are RL-controlled (bottom). Colors correspond to vehicle speeds. The black color region corresponds to the part where the trajectory leader comes to a stop around the $t=300$s mark (see Figure \ref{['fig:accel_result1']}), which the AVs manage to dampen (bottom). On this particular trajectory, the AVs improve the fuel efficiency of all of the 200 vehicles by 12.67%, as evidenced by the smoothed out colors.
...and 18 more figures

Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational test

TL;DR

Abstract

Reinforcement Learning Based Oscillation Dampening: Scaling up Single-Agent RL algorithms to a 100 AV highway field operational test

Authors

TL;DR

Abstract

Table of Contents

Figures (23)