SafeMVDrive: Multi-view Safety-Critical Driving Video Synthesis in the Real World Domain
Jiawei Zhou, Linye Lyu, Zhuotao Tian, Cheng Zhuo, Yu Li
TL;DR
SafeMVDrive addresses the scarcity of real-world, multi-view safety-critical data for end-to-end autonomous driving by integrating a VLM-guided adversarial vehicle selector with a two-stage collision-evasion trajectory generator and a diffusion-based trajectory-to-video synthesizer. The approach yields high-quality, safety-critical, multi-view driving videos grounded in real data, demonstrated on NuScenes with a 41-scene dataset and public release. Key contributions include GRPO-finetuned VLM-based adversarial vehicle selection, a two-stage, video-compatible trajectory generation pipeline, and a diffusion-based multi-view video generator that significantly stresses the planning module of E2E AD systems. The resulting data enables robust stress-testing and evaluation of autonomous driving planners in realistic, multi-view scenarios, offering practical impact for safety validation and system development.
Abstract
Safety-critical scenarios are rare yet pivotal for evaluating and enhancing the robustness of autonomous driving systems. While existing methods generate safety-critical driving trajectories, simulations, or single-view videos, they fall short of meeting the demands of advanced end-to-end autonomous systems (E2E AD), which require real-world, multi-view video data. To bridge this gap, we introduce SafeMVDrive, the first framework designed to generate high-quality, safety-critical, multi-view driving videos grounded in real-world domains. SafeMVDrive strategically integrates a safety-critical trajectory generator with an advanced multi-view video generator. To tackle the challenges inherent in this integration, we first enhance scene understanding ability of the trajectory generator by incorporating visual context -- which is previously unavailable to such generator -- and leveraging a GRPO-finetuned vision-language model to achieve more realistic and context-aware trajectory generation. Second, recognizing that existing multi-view video generators struggle to render realistic collision events, we introduce a two-stage, controllable trajectory generation mechanism that produces collision-evasion trajectories, ensuring both video quality and safety-critical fidelity. Finally, we employ a diffusion-based multi-view video generator to synthesize high-quality safety-critical driving videos from the generated trajectories. Experiments conducted on an E2E AD planner demonstrate a significant increase in collision rate when tested with our generated data, validating the effectiveness of SafeMVDrive in stress-testing planning modules. Our code, examples, and datasets are publicly available at: https://zhoujiawei3.github.io/SafeMVDrive/.
