Table of Contents
Fetching ...

Mixed Traffic Control and Coordination from Pixels

Michael Villarreal, Bibek Poudel, Jia Pan, Weizi Li

TL;DR

This work addresses congestion in mixed traffic by replacing precise sensor-based observations with bird's-eye-view image inputs for reinforcement learning policies controlling robot vehicles. Using Proximal Policy Optimization, the authors demonstrate that image-based observations achieve performance competitive with, and in some cases superior to, traditional precise observations across ring, figure-eight, merge, intersection, and bottleneck environments. The study highlights four key contributions: (1) validating BEV image observations as a generalizable, end-to-end input modality; (2) achieving comparable or improved throughput and wave-damping across multiple scenarios; (3) enabling reduced infrastructure and V2V/V2I requirements; and (4) offering insights into the limits and potential of image-based control for real-world traffic systems. The results suggest practical implications for deploying image-rich sensing in traffic networks and point to future work on scaling, robustness, and integration with predictive traffic signals and trajectory data.

Abstract

Traffic congestion is a persistent problem in our society. Previous methods for traffic control have proven futile in alleviating current congestion levels leading researchers to explore ideas with robot vehicles given the increased emergence of vehicles with different levels of autonomy on our roads. This gives rise to mixed traffic control, where robot vehicles regulate human-driven vehicles through reinforcement learning (RL). However, most existing studies use precise observations that require domain expertise and hand engineering for each road network's observation space. Additionally, precise observations use global information, such as environment outflow, and local information, i.e., vehicle positions and velocities. Obtaining this information requires updating existing road infrastructure with vast sensor environments and communication to potentially unwilling human drivers. We consider image observations, a modality that has not been extensively explored for mixed traffic control via RL, as the alternative: 1) images do not require a complete re-imagination of the observation space from environment to environment; 2) images are ubiquitous through satellite imagery, in-car camera systems, and traffic monitoring systems; and 3) images only require communication to equipment. In this work, we show robot vehicles using image observations can achieve competitive performance to using precise information on environments, including ring, figure eight, intersection, merge, and bottleneck. In certain scenarios, our approach even outperforms using precision observations, e.g., up to 8% increase in average vehicle velocity in the merge environment, despite only using local traffic information as opposed to global traffic information.

Mixed Traffic Control and Coordination from Pixels

TL;DR

This work addresses congestion in mixed traffic by replacing precise sensor-based observations with bird's-eye-view image inputs for reinforcement learning policies controlling robot vehicles. Using Proximal Policy Optimization, the authors demonstrate that image-based observations achieve performance competitive with, and in some cases superior to, traditional precise observations across ring, figure-eight, merge, intersection, and bottleneck environments. The study highlights four key contributions: (1) validating BEV image observations as a generalizable, end-to-end input modality; (2) achieving comparable or improved throughput and wave-damping across multiple scenarios; (3) enabling reduced infrastructure and V2V/V2I requirements; and (4) offering insights into the limits and potential of image-based control for real-world traffic systems. The results suggest practical implications for deploying image-rich sensing in traffic networks and point to future work on scaling, robustness, and integration with predictive traffic signals and trajectory data.

Abstract

Traffic congestion is a persistent problem in our society. Previous methods for traffic control have proven futile in alleviating current congestion levels leading researchers to explore ideas with robot vehicles given the increased emergence of vehicles with different levels of autonomy on our roads. This gives rise to mixed traffic control, where robot vehicles regulate human-driven vehicles through reinforcement learning (RL). However, most existing studies use precise observations that require domain expertise and hand engineering for each road network's observation space. Additionally, precise observations use global information, such as environment outflow, and local information, i.e., vehicle positions and velocities. Obtaining this information requires updating existing road infrastructure with vast sensor environments and communication to potentially unwilling human drivers. We consider image observations, a modality that has not been extensively explored for mixed traffic control via RL, as the alternative: 1) images do not require a complete re-imagination of the observation space from environment to environment; 2) images are ubiquitous through satellite imagery, in-car camera systems, and traffic monitoring systems; and 3) images only require communication to equipment. In this work, we show robot vehicles using image observations can achieve competitive performance to using precise information on environments, including ring, figure eight, intersection, merge, and bottleneck. In certain scenarios, our approach even outperforms using precision observations, e.g., up to 8% increase in average vehicle velocity in the merge environment, despite only using local traffic information as opposed to global traffic information.
Paper Structure (21 sections, 5 equations, 5 figures)

This paper contains 21 sections, 5 equations, 5 figures.

Figures (5)

  • Figure 1: We experiment on five mixed traffic control environments (bottleneck shown in Fig \ref{['fig:hetero_bn']}), with image observations presented beneath them. Robot vehicles (RVs) are red, while human-driven vehicles (HVs) are white. With image observations, HVs are cyan to provide contrast from the white background. We use static, grayscale, $84\times84$ images centered over RVs (or intersection) that provide only local information. Merge and bottleneck are multi-agent, while the other three are single agent.
  • Figure 2: Bottleneck environment with heterogeneous human-driven traffic. We add motorcycles (behind leftmost and rightmost RVs), public buses (in front of leftmost RV), semi-trucks (right of public bus), and delivery trucks (diagonally behind the rightmost RV) alongside regular passenger vehicles.
  • Figure 3: LEFT: An RV using image observations prevents stop-and-go waves at all densities, same as an RV using precise observations. MIDDLE and RIGHT: Time-space diagrams showing stop-and-go waves (which form around $200$ to $300$ seconds) being alleviated after RVs start taking action. MIDDLE: An RV trained on image observations prevents stop-and-go waves similar to an RV trained on precise observations. RIGHT: An RV trained using only position information can also prevent stop-and-go waves. This gives further validity of using image observations without explicitly including the velocity information in preventing stop-and-go waves.
  • Figure 4: LEFT: An RV using image observations achieves mixed traffic control comparable to an RV with precise observations in figure eight. RIGHT: Overall, RVs with image observations outperform RVs with precise observations by outperforming RVs with precise observations in $1100/200$, $1300/200$, and $1500/200$ by up to 8%.
  • Figure 5: Comparison between the queue lengths at the end of an episode between all human drivers (cyan in mixed traffic) and mixed traffic using image observations. RVs trained with image observations lessen east/westbound congestion by decreasing queue lengths by two vehicles.