Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics
Beiwen Tian, Huan-ang Gao, Leiyao Cui, Yupeng Zheng, Lan Luo, Baofeng Wang, Rong Zhi, Guyue Zhou, Hao Zhao
TL;DR
This work addresses the need for temporally informed road anomaly segmentation in autonomous driving by introducing a large synthetic video dataset with ground-truth anomaly masks and aligned G-buffers, plus a photorealistic enhancement toolkit to bridge the synthetic-real domain gap. It defines latency-aware streaming metrics and a temporal consistency measure to jointly evaluate accuracy and timely detection under realistic motion, where latency is expressed as $\Delta t$ frames. The dataset comprises 220 sequences at 60 FPS with 600 frames per sequence, high-resolution imagery, and rich rendering channels, enabling fine-grained latency assessment. The results indicate that while retraining with anomalous data improves latency-agnostic performance, it can hurt temporal stability, underscoring the importance of the new metrics for safety-critical deployment in autonomous driving.
Abstract
In the past several years, road anomaly segmentation is actively explored in the academia and drawing growing attention in the industry. The rationale behind is straightforward: if the autonomous car can brake before hitting an anomalous object, safety is promoted. However, this rationale naturally calls for a temporally informed setting while existing methods and benchmarks are designed in an unrealistic frame-wise manner. To bridge this gap, we contribute the first video anomaly segmentation dataset for autonomous driving. Since placing various anomalous objects on busy roads and annotating them in every frame are dangerous and expensive, we resort to synthetic data. To improve the relevance of this synthetic dataset to real-world applications, we train a generative adversarial network conditioned on rendering G-buffers for photorealism enhancement. Our dataset consists of 120,000 high-resolution frames at a 60 FPS framerate, as recorded in 7 different towns. As an initial benchmarking, we provide baselines using latest supervised and unsupervised road anomaly segmentation methods. Apart from conventional ones, we focus on two new metrics: temporal consistency and latencyaware streaming accuracy. We believe the latter is valuable as it measures whether an anomaly segmentation algorithm can truly prevent a car from crashing in a temporally informed setting.
