Table of Contents
Fetching ...

Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge

Yuhang Wang, Heye Huang, Zhenhua Xu, Kailai Sun, Baoshen Guo, Jinhua Zhao

TL;DR

Learning from Risk addresses the scarcity of rare, safety-critical events in autonomous driving validation by fusing data-driven motion priors with knowledge-guided optimization. The framework combines a CVAE-GNN to learn latent traffic dynamics from highD/nuScenes with an LLM that parses scene descriptions into adaptive loss terms to steer generation across risk levels. It demonstrates in CARLA and SMARTS that this approach substantially increases long-tail event coverage while preserving realism and sim-to-real fidelity, exposing ADS to more challenging interactions than existing baselines. The work offers a principled pathway for safety validation and stress-testing of autonomous systems under rare but consequential events.

Abstract

Autonomous driving faces critical challenges in rare long-tail events and complex multi-agent interactions, which are scarce in real-world data yet essential for robust safety validation. This paper presents a high-fidelity scenario generation framework that integrates a conditional variational autoencoder (CVAE) with a large language model (LLM). The CVAE encodes historical trajectories and map information from large-scale naturalistic datasets to learn latent traffic structures, enabling the generation of physically consistent base scenarios. Building on this, the LLM acts as an adversarial reasoning engine, parsing unstructured scene descriptions into domain-specific loss functions and dynamically guiding scenario generation across varying risk levels. This knowledge-driven optimization balances realism with controllability, ensuring that generated scenarios remain both plausible and risk-sensitive. Extensive experiments in CARLA and SMARTS demonstrate that our framework substantially increases the coverage of high-risk and long-tail events, improves consistency between simulated and real-world traffic distributions, and exposes autonomous driving systems to interactions that are significantly more challenging than those produced by existing rule- or data-driven methods. These results establish a new pathway for safety validation, enabling principled stress-testing of autonomous systems under rare but consequential events.

Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge

TL;DR

Learning from Risk addresses the scarcity of rare, safety-critical events in autonomous driving validation by fusing data-driven motion priors with knowledge-guided optimization. The framework combines a CVAE-GNN to learn latent traffic dynamics from highD/nuScenes with an LLM that parses scene descriptions into adaptive loss terms to steer generation across risk levels. It demonstrates in CARLA and SMARTS that this approach substantially increases long-tail event coverage while preserving realism and sim-to-real fidelity, exposing ADS to more challenging interactions than existing baselines. The work offers a principled pathway for safety validation and stress-testing of autonomous systems under rare but consequential events.

Abstract

Autonomous driving faces critical challenges in rare long-tail events and complex multi-agent interactions, which are scarce in real-world data yet essential for robust safety validation. This paper presents a high-fidelity scenario generation framework that integrates a conditional variational autoencoder (CVAE) with a large language model (LLM). The CVAE encodes historical trajectories and map information from large-scale naturalistic datasets to learn latent traffic structures, enabling the generation of physically consistent base scenarios. Building on this, the LLM acts as an adversarial reasoning engine, parsing unstructured scene descriptions into domain-specific loss functions and dynamically guiding scenario generation across varying risk levels. This knowledge-driven optimization balances realism with controllability, ensuring that generated scenarios remain both plausible and risk-sensitive. Extensive experiments in CARLA and SMARTS demonstrate that our framework substantially increases the coverage of high-risk and long-tail events, improves consistency between simulated and real-world traffic distributions, and exposes autonomous driving systems to interactions that are significantly more challenging than those produced by existing rule- or data-driven methods. These results establish a new pathway for safety validation, enabling principled stress-testing of autonomous systems under rare but consequential events.

Paper Structure

This paper contains 23 sections, 27 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Overview of the proposed LLM-guided CVAE-GNN framework for safety-critical traffic scenario generation. The left module learns motion priors from large-scale datasets (highD, nuScenes) and encodes multi-agent interactions into a latent variable $z$. The middle module parses scene semantics and employs chain-of-thought prompting to generate adaptive loss hints. A high-level controller integrates these hints with a risk field model that computes omnidirectional risk metrics and gradient feedback, dynamically adjusting loss weights ($L_\text{AdversarialCrash}$, $L_\text{MinDist\_lat}$, $L_\text{YawRate}$, $L_\text{TTC}$). The right module reshapes the latent space to generate scenarios covering low-, medium-, and high-risk regimes, resulting in a continuous distribution of risk-aware driving interactions.
  • Figure 2: Overview of the knowledge-guided loss function optimization framework. The LLM interprets scene semantics and risk indicators through chain-of-thought reasoning, computes adaptive loss weights, and guides CVAE latent-space optimization for generating physically consistent and risk-controllable traffic scenarios.
  • Figure 3: Evolution of vehicle interactions in long-tail traffic scenarios. Rows correspond to conflict types: (a) merging, (b) turning, and (c) intersection; columns show temporal evolution at $T=0$, $3$, and $6$ s. TTC and THW illustrate the progression from safe to near-collision states, highlighting the framework’s ability to generate dynamic, risk-sensitive interactions.
  • Figure 4: Comparison between original and LLM-driven scenarios. Each pair shows how the framework transforms safe trajectories (left) into high-risk, long-tail events (right): (a) left-turn conflict; (b) multi-vehicle turning conflict; (c) rear-end collision; (d) roadside entry conflict; (e) sudden deceleration; (f) intersection multi-agent risk. The LLM-guided process interprets scenario semantics and adaptively optimizes risk objectives to reproduce realistic and diverse safety-critical interactions.
  • Figure 5: Case study of long-tail risk identification and generation at the One-North urban intersection in Singapore. The ego vehicle performs a left turn under multi-agent interactions involving both stationary and moving vehicles. The bottom pipeline shows five steps: (1) scenario identification, (2) risk indicator computation, (3) behavior selection, (4) LLM-guided scenario generation, and (5) validation. The process combines quantitative metrics (TTC, THW, TLC, distance) with reasoning to produce realistic long-tail traffic scenarios.
  • ...and 1 more figures