Mixed Traffic Control and Coordination from Pixels
Michael Villarreal, Bibek Poudel, Jia Pan, Weizi Li
TL;DR
This work addresses congestion in mixed traffic by replacing precise sensor-based observations with bird's-eye-view image inputs for reinforcement learning policies controlling robot vehicles. Using Proximal Policy Optimization, the authors demonstrate that image-based observations achieve performance competitive with, and in some cases superior to, traditional precise observations across ring, figure-eight, merge, intersection, and bottleneck environments. The study highlights four key contributions: (1) validating BEV image observations as a generalizable, end-to-end input modality; (2) achieving comparable or improved throughput and wave-damping across multiple scenarios; (3) enabling reduced infrastructure and V2V/V2I requirements; and (4) offering insights into the limits and potential of image-based control for real-world traffic systems. The results suggest practical implications for deploying image-rich sensing in traffic networks and point to future work on scaling, robustness, and integration with predictive traffic signals and trajectory data.
Abstract
Traffic congestion is a persistent problem in our society. Previous methods for traffic control have proven futile in alleviating current congestion levels leading researchers to explore ideas with robot vehicles given the increased emergence of vehicles with different levels of autonomy on our roads. This gives rise to mixed traffic control, where robot vehicles regulate human-driven vehicles through reinforcement learning (RL). However, most existing studies use precise observations that require domain expertise and hand engineering for each road network's observation space. Additionally, precise observations use global information, such as environment outflow, and local information, i.e., vehicle positions and velocities. Obtaining this information requires updating existing road infrastructure with vast sensor environments and communication to potentially unwilling human drivers. We consider image observations, a modality that has not been extensively explored for mixed traffic control via RL, as the alternative: 1) images do not require a complete re-imagination of the observation space from environment to environment; 2) images are ubiquitous through satellite imagery, in-car camera systems, and traffic monitoring systems; and 3) images only require communication to equipment. In this work, we show robot vehicles using image observations can achieve competitive performance to using precise information on environments, including ring, figure eight, intersection, merge, and bottleneck. In certain scenarios, our approach even outperforms using precision observations, e.g., up to 8% increase in average vehicle velocity in the merge environment, despite only using local traffic information as opposed to global traffic information.
