Table of Contents
Fetching ...

Traffic Smoothing Controllers for Autonomous Vehicles Using Deep Reinforcement Learning and Real-World Trajectory Data

Nathan Lichtlé, Kathy Jang, Adit Shah, Eugene Vinitsky, Jonathan W. Lee, Alexandre M. Bayen

TL;DR

This work addresses energy efficiency in mixed-autonomy highway traffic by training a single RL-equipped AV to damp stop-and-go waves using real-world trajectory data. It combines a POMDP formulation with downstream traffic observations and an INRIX-based speed planner, trained via PPO to minimize energy consumption and smoothing-related costs. The approach achieves significant fuel savings, up to $>15\%$, at low AV penetration (around $4\%$), while analyzing robustness to lane-changing and loss of downstream information. The findings demonstrate the practical potential of data-driven, downstream-informed control for improving traffic energy efficiency in real-world-like settings, with clear directions for future multi-agent extensions and field validation.

Abstract

Designing traffic-smoothing cruise controllers that can be deployed onto autonomous vehicles is a key step towards improving traffic flow, reducing congestion, and enhancing fuel efficiency in mixed autonomy traffic. We bypass the common issue of having to carefully fine-tune a large traffic microsimulator by leveraging real-world trajectory data from the I-24 highway in Tennessee, replayed in a one-lane simulation. Using standard deep reinforcement learning methods, we train energy-reducing wave-smoothing policies. As an input to the agent, we observe the speed and distance of only the vehicle in front, which are local states readily available on most recent vehicles, as well as non-local observations about the downstream state of the traffic. We show that at a low 4% autonomous vehicle penetration rate, we achieve significant fuel savings of over 15% on trajectories exhibiting many stop-and-go waves. Finally, we analyze the smoothing effect of the controllers and demonstrate robustness to adding lane-changing into the simulation as well as the removal of downstream information.

Traffic Smoothing Controllers for Autonomous Vehicles Using Deep Reinforcement Learning and Real-World Trajectory Data

TL;DR

This work addresses energy efficiency in mixed-autonomy highway traffic by training a single RL-equipped AV to damp stop-and-go waves using real-world trajectory data. It combines a POMDP formulation with downstream traffic observations and an INRIX-based speed planner, trained via PPO to minimize energy consumption and smoothing-related costs. The approach achieves significant fuel savings, up to , at low AV penetration (around ), while analyzing robustness to lane-changing and loss of downstream information. The findings demonstrate the practical potential of data-driven, downstream-informed control for improving traffic energy efficiency in real-world-like settings, with clear directions for future multi-agent extensions and field validation.

Abstract

Designing traffic-smoothing cruise controllers that can be deployed onto autonomous vehicles is a key step towards improving traffic flow, reducing congestion, and enhancing fuel efficiency in mixed autonomy traffic. We bypass the common issue of having to carefully fine-tune a large traffic microsimulator by leveraging real-world trajectory data from the I-24 highway in Tennessee, replayed in a one-lane simulation. Using standard deep reinforcement learning methods, we train energy-reducing wave-smoothing policies. As an input to the agent, we observe the speed and distance of only the vehicle in front, which are local states readily available on most recent vehicles, as well as non-local observations about the downstream state of the traffic. We show that at a low 4% autonomous vehicle penetration rate, we achieve significant fuel savings of over 15% on trajectories exhibiting many stop-and-go waves. Finally, we analyze the smoothing effect of the controllers and demonstrate robustness to adding lane-changing into the simulation as well as the removal of downstream information.
Paper Structure (16 sections, 4 equations, 4 figures, 1 table)

This paper contains 16 sections, 4 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Evolution of the speed of the trajectory leader and the AVs in a platoon of 200 vehicles. The 8 AVs are equally spaced at a 4% penetration rate. The first AV in the platoon is shown in blue, the following ones are displayed by decreasing opacity and the last one is in green, demonstrating the smoothing effect of the AVs on the leader trajectory. In particular, one can see how the first AV (in blue) already smoothes the trajectory leader (in red), doesn't slow down as much or accelerate as fast, and thus saves energy.
  • Figure 2: Trajectories used for evaluation, numbered from 1 to 6. The first one corresponds to free flow, while the five others contain both low and high speeds, including sharp breaking, sharp accelerating or stop-and-go behaviors.
  • Figure 3: Time-space diagrams, each representing a simulation of 200 vehicles. Each vehicle trajectory is plotted as a line in position-by-time space, with color representing the speed of that vehicle. One can observe the wave-smoothing effect of the RL-controlled AVs over time. Horizontal lines display the throughput of the traffic flow at that particular position. Also note that as a warm-up, all AVs behave as humans when their position is negative. Top left: all 200 vehicles are IDMs. Traffic waves are illustrated by the red and black colors, while bright green represents free flow. Top right: 20 equally-spaced RL-controlled AVs (10% penetration rate, 1 AV every 10 vehicles). One can see the larger gaps opened by the AVs as the white lines between platoons. Bottom left: 8 equally-spaced RL-controlled AVs (4% penetration rate, 1 AV every 25 vehicles). Bottom right: 8 equally-spaced RL-controlled AVs (4% penetration rate) with the lane-changing model enabled (note that it is disabled in all 3 other subfigures).
  • Figure 4: Space gap of the first AV in the platoon (in blue) by time, on the same scenario as in Fig. \ref{['fig:av_speeds']}. The orange line shows the gap-closing threshold $h_t^\text{max}$ and the green line shows the failsafe threshold $h_t^\text{min}$, introduced in Sec. \ref{['sec:action_space']}.