Table of Contents
Fetching ...

From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events

Yan Miao, Georgios Fainekos, Bardh Hoxha, Hideki Okamoto, Danil Prokhorov, Sayan Mitra

TL;DR

This work tackles the challenging problem of converting real-world dashcam crash videos into realistic, parameterizable ADS test scenarios in CARLA. It introduces a four-component framework that uses prompt-engineered Video-Language Models to generate SCENIC scripts from videos (Sout), translates these into CARLA simulations (Vsim), and employs a similarity metric $Sim(V_{real}, V_{sim})$ with per-feature thresholds $\tau_i$ to iteratively refine the scenarios. Key contributions include automated real-to-sim conversion, a bridge via a similarity metric between real and simulated driving features, an iterative feedback loop using ScriptGPT and FeatureGPT, and substantial time efficiency gains (minutes vs hours) while preserving essential driving behaviors. This approach enables flexible, behavior-focused test case generation for search-based testing of ADS, with potential to improve robustness by exploring diverse environmental variations around core driving events.

Abstract

Testing Automated Driving Systems (ADS) in simulation with realistic driving scenarios is important for verifying their performance. However, converting real-world driving videos into simulation scenarios is a significant challenge due to the complexity of interpreting high-dimensional video data and the time-consuming nature of precise manual scenario reconstruction. In this work, we propose a novel framework that automates the conversion of real-world car crash videos into detailed simulation scenarios for ADS testing. Our approach leverages prompt-engineered Video Language Models(VLM) to transform dashcam footage into SCENIC scripts, which define the environment and driving behaviors in the CARLA simulator, enabling the generation of realistic simulation scenarios. Importantly, rather than solely aiming for one-to-one scenario reconstruction, our framework focuses on capturing the essential driving behaviors from the original video while offering flexibility in parameters such as weather or road conditions to facilitate search-based testing. Additionally, we introduce a similarity metric that helps iteratively refine the generated scenario through feedback by comparing key features of driving behaviors between the real and simulated videos. Our preliminary results demonstrate substantial time efficiency, finishing the real-to-sim conversion in minutes with full automation and no human intervention, while maintaining high fidelity to the original driving events.

From Dashcam Videos to Driving Simulations: Stress Testing Automated Vehicles against Rare Events

TL;DR

This work tackles the challenging problem of converting real-world dashcam crash videos into realistic, parameterizable ADS test scenarios in CARLA. It introduces a four-component framework that uses prompt-engineered Video-Language Models to generate SCENIC scripts from videos (Sout), translates these into CARLA simulations (Vsim), and employs a similarity metric with per-feature thresholds to iteratively refine the scenarios. Key contributions include automated real-to-sim conversion, a bridge via a similarity metric between real and simulated driving features, an iterative feedback loop using ScriptGPT and FeatureGPT, and substantial time efficiency gains (minutes vs hours) while preserving essential driving behaviors. This approach enables flexible, behavior-focused test case generation for search-based testing of ADS, with potential to improve robustness by exploring diverse environmental variations around core driving events.

Abstract

Testing Automated Driving Systems (ADS) in simulation with realistic driving scenarios is important for verifying their performance. However, converting real-world driving videos into simulation scenarios is a significant challenge due to the complexity of interpreting high-dimensional video data and the time-consuming nature of precise manual scenario reconstruction. In this work, we propose a novel framework that automates the conversion of real-world car crash videos into detailed simulation scenarios for ADS testing. Our approach leverages prompt-engineered Video Language Models(VLM) to transform dashcam footage into SCENIC scripts, which define the environment and driving behaviors in the CARLA simulator, enabling the generation of realistic simulation scenarios. Importantly, rather than solely aiming for one-to-one scenario reconstruction, our framework focuses on capturing the essential driving behaviors from the original video while offering flexibility in parameters such as weather or road conditions to facilitate search-based testing. Additionally, we introduce a similarity metric that helps iteratively refine the generated scenario through feedback by comparing key features of driving behaviors between the real and simulated videos. Our preliminary results demonstrate substantial time efficiency, finishing the real-to-sim conversion in minutes with full automation and no human intervention, while maintaining high fidelity to the original driving events.

Paper Structure

This paper contains 23 sections, 1 equation, 9 figures, 1 table.

Figures (9)

  • Figure 1: Train ScriptGPT and FeatureGPT using Prompt Engineering
  • Figure 2: After prompt engineering, the dash cam video is fed into ScriptGPT, which synthesizes descriptive language in the SCENIC format. This SCENIC script can then be executed in CARLA to generate a corresponding testing scenario in simulation.
  • Figure 3: Iterative Refinement Process: After obtaining the simulated video $V_{sim}$ from ScriptGPT and SCENIC, both the original and simulated videos are fed into FeatureGPT to evaluate the probabilities of predefined features. If the difference of any feature between the original and simulated videos exceeds a certain threshold, we iteratively refine ScriptGPT by incorporating additional feedback into the SCENIC script, guiding further scenario adjustments until the similarity improves.
  • Figure 4: Vehicle Cutting In with Pedestrian Crossing Scenario: in the original dash camera video (top row), the vehicle on the right performs an emergency lane change to the left due to a jaywalking pedestrian in red. In the generated scenario (bottom row) produced by our framework, the vehicle on the right exhibited a similar lane change behavior to the left to avoid a jaywalking pedestrian.
  • Figure 5: Opposite Vehicle Invading Lane Scenario: in the original dash camera video (top row), the vehicle on the opposite lane gradually swifts to ego's lane probably due to loss of focus. In the generated scenario (bottom row) produced by our framework, the vehicle on the opposite lane exhibited a similar lane change behavior to switch to our lane and caused collision.
  • ...and 4 more figures