Safety-Critical Scenario Generation Via Reinforcement Learning Based Editing

Haolan Liu; Liangjun Zhang; Siva Kumar Sastry Hari; Jishen Zhao

Safety-Critical Scenario Generation Via Reinforcement Learning Based Editing

Haolan Liu, Liangjun Zhang, Siva Kumar Sastry Hari, Jishen Zhao

TL;DR

This work tackles the long-tail problem in autonomous vehicle safety by introducing a reinforcement-learning-based scenario editor that sequentially edits driving scenarios via actions like adding agents or perturbing trajectories. It combines an anchor-based risk score with a learned plausibility model (CVAE/autoregressive components) to generate challenging yet realistic safety-critical scenarios, trained with PPO. The framework supports flexible, high-dimensional scenario representations and outperforms prior methods in generating high-quality, diverse risk scenarios while maintaining plausibility. Empirical results in highway-env and Argoverse-informed settings show improved efficiency over black-box optimization and stronger realism compared to baselines, offering a practical tool for AV safety validation and testing.

Abstract

Generating safety-critical scenarios is essential for testing and verifying the safety of autonomous vehicles. Traditional optimization techniques suffer from the curse of dimensionality and limit the search space to fixed parameter spaces. To address these challenges, we propose a deep reinforcement learning approach that generates scenarios by sequential editing, such as adding new agents or modifying the trajectories of the existing agents. Our framework employs a reward function consisting of both risk and plausibility objectives. The plausibility objective leverages generative models, such as a variational autoencoder, to learn the likelihood of the generated parameters from the training datasets; It penalizes the generation of unlikely scenarios. Our approach overcomes the dimensionality challenge and explores a wide range of safety-critical scenarios. Our evaluation demonstrates that the proposed method generates safety-critical scenarios of higher quality compared with previous approaches.

Safety-Critical Scenario Generation Via Reinforcement Learning Based Editing

TL;DR

Abstract

Paper Structure (27 sections, 3 equations, 6 figures, 2 tables)

This paper contains 27 sections, 3 equations, 6 figures, 2 tables.

Introduction
Related Work
Adversarial Generation
Generative Models
Reinforcement Learning
Structure Generation
Proposed Method
Problem Definition
Generation using RL
State space
Action space
Transition Dynamics
Reward function
Policy Network
Risky Model
...and 12 more sections

Figures (6)

Figure 1: (1) Adding exploration into scenario generation (newly added agents) helps generate high-quality safety-critical scenarios. (2) Jointly optimizing all the traffic agents (including agent $B,C,D$) in the scenario is not necessary, and can even hinder the optimization process, resulting in suboptimal results. Our framework edits the scenario efficiently in an agent-wise manner and search for desirable scenarios by sequential exploration.
Figure 2: Our scenario editor is based on reinforcement learning, with a policy network that optimizes the scenario iteratively. In each iteration, the network takes the current scenario as input and outputs a distribution of editing actions. To train the agent, we use the risk model and adversarial loss from the pretrained model as the reward. In this example, the policy network adds a new agent to the scenario. The circle indicates the endpoint for each agent's trajectory. The blue, green, and red line indicate the trajectories of the ego vehicle, the newly added agent, and other agents in the scenario.
Figure 3: The editing action supported by our framework.
Figure 4: The risk model evaluates the safety of the autonomous vehicle's driving plans. In the given scenario, there are three predefined plans: one blue and two green. The blue anchor leads to a collision, while the two green anchors are safe. The risk model counts the number of feasible driving plans and uses this information to guide the scenario editing process.
Figure 5: The original scenario (left) and our edited version (right). The dotted line represents the lane centerline, and the circle denotes the final point of the trajectory. The green and red lines represent the trajectories of traffic agents, with the green line highlighting the safety-critical agents. For clarity, we only show the nearby agents in the figure. The blue line represents the AV's trajectory. (1) lane changing. (2) intersection.
...and 1 more figures

Safety-Critical Scenario Generation Via Reinforcement Learning Based Editing

TL;DR

Abstract

Safety-Critical Scenario Generation Via Reinforcement Learning Based Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (6)