Table of Contents
Fetching ...

Flow: A Modular Learning Framework for Mixed Autonomy Traffic

Cathy Wu, Aboudy Kreidieh, Kanaad Parvate, Eugene Vinitsky, Alexandre M Bayen

TL;DR

This paper tackles how autonomous vehicles influence traffic during the early adoption phase by introducing Flow, a modular, open-source framework that uses deep reinforcement learning to compose reusable traffic scenarios. Flow decouples system dynamics from control laws, enabling data-driven learning of AV policies that improve system-wide velocity under partial AV penetration and across diverse topologies (single/multi-lane tracks, intersections). Empirically, learned policies achieve near-optimal performance, generalize to unseen densities, and can outperform model-based baselines, with partial observability sometimes easing training and yielding interpretable control laws. The work lays a foundation for scalable, evidence-based study of mixed autonomy and provides pathways for extending Flow to broader networks and automation modalities.

Abstract

The rapid development of autonomous vehicles (AVs) holds vast potential for transportation systems through improved safety, efficiency, and access to mobility. However, the progression of these impacts, as AVs are adopted, is not well understood. Numerous technical challenges arise from the goal of analyzing the partial adoption of autonomy: partial control and observation, multi-vehicle interactions, and the sheer variety of scenarios represented by real-world networks. To shed light into near-term AV impacts, this article studies the suitability of deep reinforcement learning (RL) for overcoming these challenges in a low AV-adoption regime. A modular learning framework is presented, which leverages deep RL to address complex traffic dynamics. Modules are composed to capture common traffic phenomena (stop-and-go traffic jams, lane changing, intersections). Learned control laws are found to improve upon human driving performance, in terms of system-level velocity, by up to 57% with only 4-7% adoption of AVs. Furthermore, in single-lane traffic, a small neural network control law with only local observation is found to eliminate stop-and-go traffic - surpassing all known model-based controllers to achieve near-optimal performance - and generalize to out-of-distribution traffic densities.

Flow: A Modular Learning Framework for Mixed Autonomy Traffic

TL;DR

This paper tackles how autonomous vehicles influence traffic during the early adoption phase by introducing Flow, a modular, open-source framework that uses deep reinforcement learning to compose reusable traffic scenarios. Flow decouples system dynamics from control laws, enabling data-driven learning of AV policies that improve system-wide velocity under partial AV penetration and across diverse topologies (single/multi-lane tracks, intersections). Empirically, learned policies achieve near-optimal performance, generalize to unseen densities, and can outperform model-based baselines, with partial observability sometimes easing training and yielding interpretable control laws. The work lays a foundation for scalable, evidence-based study of mixed autonomy and provides pathways for extending Flow to broader networks and automation modalities.

Abstract

The rapid development of autonomous vehicles (AVs) holds vast potential for transportation systems through improved safety, efficiency, and access to mobility. However, the progression of these impacts, as AVs are adopted, is not well understood. Numerous technical challenges arise from the goal of analyzing the partial adoption of autonomy: partial control and observation, multi-vehicle interactions, and the sheer variety of scenarios represented by real-world networks. To shed light into near-term AV impacts, this article studies the suitability of deep reinforcement learning (RL) for overcoming these challenges in a low AV-adoption regime. A modular learning framework is presented, which leverages deep RL to address complex traffic dynamics. Modules are composed to capture common traffic phenomena (stop-and-go traffic jams, lane changing, intersections). Learned control laws are found to improve upon human driving performance, in terms of system-level velocity, by up to 57% with only 4-7% adoption of AVs. Furthermore, in single-lane traffic, a small neural network control law with only local observation is found to eliminate stop-and-go traffic - surpassing all known model-based controllers to achieve near-optimal performance - and generalize to out-of-distribution traffic densities.

Paper Structure

This paper contains 30 sections, 12 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Example network modules supported by the Flow framework. Top left: Single-lane circular track. Top middle: Multi-lane circular track. Top right: Figure-eight road network. Bottom left: Intersection network. Bottom middle: Closed loop merge network. Bottom right: Imported San Francisco network. In Flow, scenarios can be generated using OpenStreetMap (OSM) data and vehicle dynamics models from the traffic microsimulation package SUMO.
  • Figure 2: Flow is a modular learning framework, which enables the composition of diverse mixed autonomy traffic scenarios for study with deep reinforcement learning. Scenarios conform to a (PO)MDP interface (dotted rectangles). Modules (solid rectangles) of varying types are composed to form scenarios. Actors may include vehicles or infrastructure, and may be learned or pre-specified. Additional dynamics may include vehicle fail-safes, right-of-way rules, and physical limitations. Additional parameters (hexagons) may also be configured. Flow invokes external libraries for RL training and simulation.
  • Figure 3: Performance of AV control laws for the single-lane mixed autonomy track. The overall system velocity of learned (GRU, MLP, and Linear) and model-based (FollowerStopper and PI Saturation) control laws are averaged for the final 100 s of simulation time over ten runs at each evaluated density. Also displayed are the performance upper and lower bounds, derived from the unstable and stable system limit cycles, respectively. The white and gray regions indicate the training-time and testing-time densities, respectively.
  • Figure 4: Velocity profile for single-lane mixed autonomy track. Sample evaluations start with 300 seconds where the AV is overridden by IDM human driving behavior to allow the formation of traffic waves, followed by 300 seconds with four different AV control laws. Both learned control laws bring the system to close to the 4.82 m/s uniform flow velocity. A successful evaluation of the PI Saturation controller is shown; however, it can be inconsistent in its performance across episodes. The FollowerStopper falls short, settling at 4.15 m/s. The GRU control law reaches the optimal velocity fastest.
  • Figure 5: Space-time trajectories for single-lane mixed autonomy track, for learned and model-based AV control laws. Traffic waves form during the first 300 s.
  • ...and 3 more figures