DriveSceneGen: Generating Diverse and Realistic Driving Scenarios from Scratch
Shuo Sun, Zekai Gu, Tianchen Sun, Jiawei Sun, Chengran Yuan, Yuhang Han, Dongen Li, Marcelo H. Ang
TL;DR
DriveSceneGen tackles the shortage of diverse driving scenarios by learning from real-world data and generating complete scenes from scratch. It combines a diffusion-based generation stage to create a raster BEV representation of the initial scene, a graph-based vectorization stage to recover a coherent lane-map, and a simulation stage that uses multi-modal trajectory prediction to produce multiple plausible futures conditioned on the generated scene ($5k$ generated scenarios vs $70k$ ground-truth). The approach introduces the first end-to-end method to generate joint static maps and dynamic agents, supported by a BEV-to-graph vectorization pipeline and a repurposed multi-modal predictor for futures, and it demonstrates high fidelity and diversity on the Waymo Motion dataset. This work enables scalable, data-driven generation of driving scenarios for training, validation, and safety testing of autonomous systems, with potential applicability beyond robotics.
Abstract
Realistic and diverse traffic scenarios in large quantities are crucial for the development and validation of autonomous driving systems. However, owing to numerous difficulties in the data collection process and the reliance on intensive annotations, real-world datasets lack sufficient quantity and diversity to support the increasing demand for data. This work introduces DriveSceneGen, a data-driven driving scenario generation method that learns from the real-world driving dataset and generates entire dynamic driving scenarios from scratch. DriveSceneGen is able to generate novel driving scenarios that align with real-world data distributions with high fidelity and diversity. Experimental results on 5k generated scenarios highlight the generation quality, diversity, and scalability compared to real-world datasets. To the best of our knowledge, DriveSceneGen is the first method that generates novel driving scenarios involving both static map elements and dynamic traffic participants from scratch.
