Are AI-Generated Driving Videos Ready for Autonomous Driving? A Diagnostic Evaluation Framework
Xinhao Xiang, Abhijeet Rastogi, Jiawei Zhang
TL;DR
<3-5 sentence high-level summary> This paper tackles the risk that AI-generated driving videos (AIGVs) may harm autonomous driving (AD) models when used for training or evaluation. It introduces ADGV-Bench, a driving-focused benchmark with dense perception annotations, and ADGVE, a driving-aware evaluator that fuses static, temporal, lane, and Vision-Language checks to rate clip quality. The authors show that naive use of raw AIGVs degrades AD perception, while filtering with ADGVE improves downstream detection, tracking, and segmentation and enables AIGVs to complement real data. The work provides a practical, model-agnostic quality gate for safely integrating large-scale generated driving videos into AD pipelines.
Abstract
Recent text-to-video models have enabled the generation of high-resolution driving scenes from natural language prompts. These AI-generated driving videos (AIGVs) offer a low-cost, scalable alternative to real or simulator data for autonomous driving (AD). But a key question remains: can such videos reliably support training and evaluation of AD models? We present a diagnostic framework that systematically studies this question. First, we introduce a taxonomy of frequent AIGV failure modes, including visual artifacts, physically implausible motion, and violations of traffic semantics, and demonstrate their negative impact on object detection, tracking, and instance segmentation. To support this analysis, we build ADGV-Bench, a driving-focused benchmark with human quality annotations and dense labels for multiple perception tasks. We then propose ADGVE, a driving-aware evaluator that combines static semantics, temporal cues, lane obedience signals, and Vision-Language Model(VLM)-guided reasoning into a single quality score for each clip. Experiments show that blindly adding raw AIGVs can degrade perception performance, while filtering them with ADGVE consistently improves both general video quality assessment metrics and downstream AD models, and turns AIGVs into a beneficial complement to real-world data. Our study highlights both the risks and the promise of AIGVs, and provides practical tools for safely leveraging large-scale video generation in future AD pipelines.
