Table of Contents
Fetching ...

Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics

Beiwen Tian, Huan-ang Gao, Leiyao Cui, Yupeng Zheng, Lan Luo, Baofeng Wang, Rong Zhi, Guyue Zhou, Hao Zhao

TL;DR

This work addresses the need for temporally informed road anomaly segmentation in autonomous driving by introducing a large synthetic video dataset with ground-truth anomaly masks and aligned G-buffers, plus a photorealistic enhancement toolkit to bridge the synthetic-real domain gap. It defines latency-aware streaming metrics and a temporal consistency measure to jointly evaluate accuracy and timely detection under realistic motion, where latency is expressed as $\Delta t$ frames. The dataset comprises 220 sequences at 60 FPS with 600 frames per sequence, high-resolution imagery, and rich rendering channels, enabling fine-grained latency assessment. The results indicate that while retraining with anomalous data improves latency-agnostic performance, it can hurt temporal stability, underscoring the importance of the new metrics for safety-critical deployment in autonomous driving.

Abstract

In the past several years, road anomaly segmentation is actively explored in the academia and drawing growing attention in the industry. The rationale behind is straightforward: if the autonomous car can brake before hitting an anomalous object, safety is promoted. However, this rationale naturally calls for a temporally informed setting while existing methods and benchmarks are designed in an unrealistic frame-wise manner. To bridge this gap, we contribute the first video anomaly segmentation dataset for autonomous driving. Since placing various anomalous objects on busy roads and annotating them in every frame are dangerous and expensive, we resort to synthetic data. To improve the relevance of this synthetic dataset to real-world applications, we train a generative adversarial network conditioned on rendering G-buffers for photorealism enhancement. Our dataset consists of 120,000 high-resolution frames at a 60 FPS framerate, as recorded in 7 different towns. As an initial benchmarking, we provide baselines using latest supervised and unsupervised road anomaly segmentation methods. Apart from conventional ones, we focus on two new metrics: temporal consistency and latencyaware streaming accuracy. We believe the latter is valuable as it measures whether an anomaly segmentation algorithm can truly prevent a car from crashing in a temporally informed setting.

Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics

TL;DR

This work addresses the need for temporally informed road anomaly segmentation in autonomous driving by introducing a large synthetic video dataset with ground-truth anomaly masks and aligned G-buffers, plus a photorealistic enhancement toolkit to bridge the synthetic-real domain gap. It defines latency-aware streaming metrics and a temporal consistency measure to jointly evaluate accuracy and timely detection under realistic motion, where latency is expressed as frames. The dataset comprises 220 sequences at 60 FPS with 600 frames per sequence, high-resolution imagery, and rich rendering channels, enabling fine-grained latency assessment. The results indicate that while retraining with anomalous data improves latency-agnostic performance, it can hurt temporal stability, underscoring the importance of the new metrics for safety-critical deployment in autonomous driving.

Abstract

In the past several years, road anomaly segmentation is actively explored in the academia and drawing growing attention in the industry. The rationale behind is straightforward: if the autonomous car can brake before hitting an anomalous object, safety is promoted. However, this rationale naturally calls for a temporally informed setting while existing methods and benchmarks are designed in an unrealistic frame-wise manner. To bridge this gap, we contribute the first video anomaly segmentation dataset for autonomous driving. Since placing various anomalous objects on busy roads and annotating them in every frame are dangerous and expensive, we resort to synthetic data. To improve the relevance of this synthetic dataset to real-world applications, we train a generative adversarial network conditioned on rendering G-buffers for photorealism enhancement. Our dataset consists of 120,000 high-resolution frames at a 60 FPS framerate, as recorded in 7 different towns. As an initial benchmarking, we provide baselines using latest supervised and unsupervised road anomaly segmentation methods. Apart from conventional ones, we focus on two new metrics: temporal consistency and latencyaware streaming accuracy. We believe the latter is valuable as it measures whether an anomaly segmentation algorithm can truly prevent a car from crashing in a temporally informed setting.
Paper Structure (17 sections, 3 equations, 6 figures, 1 table)

This paper contains 17 sections, 3 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The demonstration of the key features for the proposed benchmark. (a) 60 of the total 600 frames in one video sample. Each frame has a resolution of 1920$\times$ 1080 and each video sample has a frame rate of 60 FPS. (b) Each frame is recorded with aligned semantic, instance and anomaly map as well as the rendering G-buffers: depth, diffuse, normal, metallic, specular and roughness. (c) The concept demonstration for inference latency (i.e., the inference time of the road anoamly segmentation method for evaluation). The predictions of high-latency methods become invalid and impossible for practical uses. (d) The comparison before and after applying photorealistic rendering to transfer to the styles of Cityscapes cordts2016cityscapes and nuScenes nuscenes2019.
  • Figure 2: The demonstration of available anomalous objects in the dataset. Anomalous objects are emphasized by the yellow boxes added in postprocessing.
  • Figure 3: The spatial distribution of anomalous objects. Red area demonstrates higher frequency of anomalous objects.
  • Figure 4: The examples of photorealistic enhancement on the collected simulated video frames. The anomalous objects are also enhanced with the same style.
  • Figure 5: Illustration of the proposed metrics. (a)Latency-Agnostic Metrics. The evaluation is performed on a frame basis, irrelevant of time or latency. (b) Latency-Aware Streaming Metrics. The predicted anomaly score map for the input frame at time $T_0$ is compared with the ground truth anomaly map at time $T_0 + \Delta t$. Here $\Delta t$ is the latency of the method. (c) Temporal Consistency Metric. The predicted anomaly map at time $T_0$ is projected to the image space at $T_0 + \Delta t$ and compared with the predicted anomaly map at time $T_0 + \Delta t$. The latency $\Delta t$ between the two demonstrated frames is 60 frames (or equivalently 1 second).
  • ...and 1 more figures