A Simple Framework Towards Vision-based Traffic Signal Control with Microscopic Simulation
Pan He, Quanyi Li, Xiaoyong Yuan, Bolei Zhou
TL;DR
This work introduces TrafficDojo, a modular, open-source framework that bridges microscopic SUMO traffic simulations with the 3D rendering capability of MetaDrive to enable vision-based traffic signal control (TSC). By providing flexible sensor configurations, multi-view BEV observations, and a suite of RL and traditional baselines, the authors benchmark end-to-end TSC policies directly from visual inputs, including the use of frozen foundational models for feature extraction. Key findings show that BEV-based vision RL policies, especially when paired with discriminative features from foundational models (BEV*), can outperform feature-based approaches and offer improved convergence and reduced CO2 emissions in simulated intersections. The framework's open, extensible design and the dual-SUMO/MetaDrive setup open opportunities for robust evaluation of vision-based TSC and accelerate research across reinforcement learning and transportation communities.
Abstract
Traffic signal control (TSC) is crucial for reducing traffic congestion leading to smoother traffic flow, reduced idle time, and mitigated CO2 emissions. In this paper, we explore the computer vision approach for TSC that modulates on-road traffic flows through visual observation. Unlike traditional feature-based approaches, vision-based methods depend much less on heuristics and predefined features, bringing promising potentials for end-to-end learning and optimization of traffic signals. Thus, we introduce a simple traffic simulation framework called TrafficDojo towards vision-based TSC and its benchmark by integrating the microscopic traffic flow provided in SUMO into the 3D driving simulator MetaDrive. This proposed framework offers a versatile traffic environment for in-depth analysis and comprehensive evaluation of traffic signal controllers across diverse traffic conditions and scenarios. We establish and compare baseline algorithms including both traditional and Reinforcement Learning (RL) approaches. This work sheds light on the design and development of vision-based TSC approaches and opens up new research opportunities
