From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events
Yan Miao, Georgios Fainekos, Bardh Hoxha, Hideki Okamoto, Danil Prokhorov, Sayan Mitra
TL;DR
This work tackles the challenging problem of converting real-world dashcam crash videos into realistic, parameterizable ADS test scenarios in CARLA. It introduces a four-component framework that uses prompt-engineered Video-Language Models to generate SCENIC scripts from videos (Sout), translates these into CARLA simulations (Vsim), and employs a similarity metric $Sim(V_{real}, V_{sim})$ with per-feature thresholds $\tau_i$ to iteratively refine the scenarios. Key contributions include automated real-to-sim conversion, a bridge via a similarity metric between real and simulated driving features, an iterative feedback loop using ScriptGPT and FeatureGPT, and substantial time efficiency gains (minutes vs hours) while preserving essential driving behaviors. This approach enables flexible, behavior-focused test case generation for search-based testing of ADS, with potential to improve robustness by exploring diverse environmental variations around core driving events.
Abstract
Testing Automated Driving Systems (ADS) in simulation with realistic driving scenarios is important for verifying their performance. However, converting real-world driving videos into simulation scenarios is a significant challenge due to the complexity of interpreting high-dimensional video data and the time-consuming nature of precise manual scenario reconstruction. In this work, we propose a novel framework that automates the conversion of real-world car crash videos into detailed simulation scenarios for ADS testing. Our approach leverages prompt-engineered Video Language Models(VLM) to transform dashcam footage into SCENIC scripts, which define the environment and driving behaviors in the CARLA simulator, enabling the generation of realistic simulation scenarios. Importantly, rather than solely aiming for one-to-one scenario reconstruction, our framework focuses on capturing the essential driving behaviors from the original video while offering flexibility in parameters such as weather or road conditions to facilitate search-based testing. Additionally, we introduce a similarity metric that helps iteratively refine the generated scenario through feedback by comparing key features of driving behaviors between the real and simulated videos. Our preliminary results demonstrate substantial time efficiency, finishing the real-to-sim conversion in minutes with full automation and no human intervention, while maintaining high fidelity to the original driving events.
