Table of Contents
Fetching ...

CrashAgent: Crash Scenario Generation via Multi-modal Reasoning

Miao Li, Wenhao Ding, Haohong Lin, Yiqi Lyu, Yihang Yao, Yuyou Zhang, Ding Zhao

TL;DR

The paper tackles the scarcity and skew of safety-critical driving scenarios by grounding scenario generation in real crash reports and leveraging multi-modal reasoning. It introduces CrashAgent, a three-agent pipeline (Sketch Agent, Road Agent, Scenario Agent) that uses visual tree-of-thought prompting, OpenDRIVE/OpenSCENARIO generation, and a 42-element scenario library with genetic parameter optimization to produce realistic, diverse, simulator-ready crash scenarios. Through quantitative ablations and qualitative evaluations on NHTSA data, the authors demonstrate improvements in layout accuracy, collision realism, and scenario variety, and they release a large-scale crash-scenario dataset to support future autonomous driving safety research. This framework offers a scalable, interpretable pathway for rigorous AV safety testing, advanced fault attribution, and counterfactual analysis, potentially reducing deployment risks in real-world environments.

Abstract

Training and evaluating autonomous driving algorithms requires a diverse range of scenarios. However, most available datasets predominantly consist of normal driving behaviors demonstrated by human drivers, resulting in a limited number of safety-critical cases. This imbalance, often referred to as a long-tail distribution, restricts the ability of driving algorithms to learn from crucial scenarios involving risk or failure, scenarios that are essential for humans to develop driving skills efficiently. To generate such scenarios, we utilize Multi-modal Large Language Models to convert crash reports of accidents into a structured scenario format, which can be directly executed within simulations. Specifically, we introduce CrashAgent, a multi-agent framework designed to interpret multi-modal real-world traffic crash reports for the generation of both road layouts and the behaviors of the ego vehicle and surrounding traffic participants. We comprehensively evaluate the generated crash scenarios from multiple perspectives, including the accuracy of layout reconstruction, collision rate, and diversity. The resulting high-quality and large-scale crash dataset will be publicly available to support the development of safe driving algorithms in handling safety-critical situations.

CrashAgent: Crash Scenario Generation via Multi-modal Reasoning

TL;DR

The paper tackles the scarcity and skew of safety-critical driving scenarios by grounding scenario generation in real crash reports and leveraging multi-modal reasoning. It introduces CrashAgent, a three-agent pipeline (Sketch Agent, Road Agent, Scenario Agent) that uses visual tree-of-thought prompting, OpenDRIVE/OpenSCENARIO generation, and a 42-element scenario library with genetic parameter optimization to produce realistic, diverse, simulator-ready crash scenarios. Through quantitative ablations and qualitative evaluations on NHTSA data, the authors demonstrate improvements in layout accuracy, collision realism, and scenario variety, and they release a large-scale crash-scenario dataset to support future autonomous driving safety research. This framework offers a scalable, interpretable pathway for rigorous AV safety testing, advanced fault attribution, and counterfactual analysis, potentially reducing deployment risks in real-world environments.

Abstract

Training and evaluating autonomous driving algorithms requires a diverse range of scenarios. However, most available datasets predominantly consist of normal driving behaviors demonstrated by human drivers, resulting in a limited number of safety-critical cases. This imbalance, often referred to as a long-tail distribution, restricts the ability of driving algorithms to learn from crucial scenarios involving risk or failure, scenarios that are essential for humans to develop driving skills efficiently. To generate such scenarios, we utilize Multi-modal Large Language Models to convert crash reports of accidents into a structured scenario format, which can be directly executed within simulations. Specifically, we introduce CrashAgent, a multi-agent framework designed to interpret multi-modal real-world traffic crash reports for the generation of both road layouts and the behaviors of the ego vehicle and surrounding traffic participants. We comprehensively evaluate the generated crash scenarios from multiple perspectives, including the accuracy of layout reconstruction, collision rate, and diversity. The resulting high-quality and large-scale crash dataset will be publicly available to support the development of safe driving algorithms in handling safety-critical situations.

Paper Structure

This paper contains 26 sections, 10 figures, 5 tables.

Figures (10)

  • Figure 1: CrashAgent the crash report and diagram to scenarios that can be tested in simulators.
  • Figure 2: Detailed framework of CrashAgent
  • Figure 3: A hierarchical category of all crash scenarios in the NHTSA CISS crash dataset.
  • Figure 4: Generation results from CrashAgent with a crash between a vehicle and a deer.
  • Figure 5: Generation results from CrashAgent with a crash between two vehicles in an interaction.
  • ...and 5 more figures