Safe Urban Traffic Control via Uncertainty-Aware Conformal Prediction and World-Model Reinforcement Learning
Joydeep Chandra, Satyam Kumar Navneet, Aleksandr Algazinov, Yong Zhang
TL;DR
The paper addresses safe urban traffic control by propagating calibrated uncertainty across forecasting, anomaly detection, and reinforcement learning. It introduces PU-GAT+ for uncertainty-guided attention, CRFN-BY for dependence-robust anomaly detection with conformal p-values, and LyCon-WRL+ for Lyapunov-certified safe RL with Lipschitz bounds from spectral normalization. The framework achieves distribution-free coverage (≈91.4%), FDR control under dependence (≈4.1%), and a safety-enhanced RL performance (≈95.2% safe episodes) with real-time inference (~23 ms), demonstrating that reliability guarantees can be maintained without sacrificing performance. These results suggest significant practical impact for deploying robust, safe ML-guided traffic control in urban environments, while also outlining limitations such as BY conservatism and scalability challenges that worth addressing in future work.
Abstract
Urban traffic management demands systems that simultaneously predict future conditions, detect anomalies, and take safe corrective actions -- all while providing reliability guarantees. We present STREAM-RL, a unified framework that introduces three novel algorithmic contributions: (1) PU-GAT+, an Uncertainty-Guided Adaptive Conformal Forecaster that uses prediction uncertainty to dynamically reweight graph attention via confidence-monotonic attention, achieving distribution-free coverage guarantees; (2) CRFN-BY, a Conformal Residual Flow Network that models uncertainty-normalized residuals via normalizing flows with Benjamini-Yekutieli FDR control under arbitrary dependence; and (3) LyCon-WRL+, an Uncertainty-Guided Safe World-Model RL agent with Lyapunov stability certificates, certified Lipschitz bounds, and uncertainty-propagated imagination rollouts. To our knowledge, this is the first framework to propagate calibrated uncertainty from forecasting through anomaly detection to safe policy learning with end-to-end theoretical guarantees. Experiments on multiple real-world traffic trajectory data demonstrate that STREAM-RL achieves 91.4\% coverage efficiency, controls FDR at 4.1\% under verified dependence, and improves safety rate to 95.2\% compared to 69\% for standard PPO while achieving higher reward, with 23ms end-to-end inference latency.
