Table of Contents
Fetching ...

CRASH: Challenging Reinforcement-Learning Based Adversarial Scenarios For Safety Hardening

Amar Kulkarni, Shangtong Zhang, Madhur Behl

TL;DR

CRASH can control adversarial Non Player Character agents in an AV simulator to automatically induce collisions with the Ego vehicle, falsifying its motion planner, and reduces the Ego vehicle's collision rate by 26%.

Abstract

Ensuring the safety of autonomous vehicles (AVs) requires identifying rare but critical failure cases that on-road testing alone cannot discover. High-fidelity simulations provide a scalable alternative, but automatically generating realistic and diverse traffic scenarios that can effectively stress test AV motion planners remains a key challenge. This paper introduces CRASH - Challenging Reinforcement-learning based Adversarial scenarios for Safety Hardening - an adversarial deep reinforcement learning framework to address this issue. First CRASH can control adversarial Non Player Character (NPC) agents in an AV simulator to automatically induce collisions with the Ego vehicle, falsifying its motion planner. We also propose a novel approach, that we term safety hardening, which iteratively refines the motion planner by simulating improvement scenarios against adversarial agents, leveraging the failure cases to strengthen the AV stack. CRASH is evaluated on a simplified two-lane highway scenario, demonstrating its ability to falsify both rule-based and learning-based planners with collision rates exceeding 90%. Additionally, safety hardening reduces the Ego vehicle's collision rate by 26%. While preliminary, these results highlight RL-based safety hardening as a promising approach for scenario-driven simulation testing for autonomous vehicles.

CRASH: Challenging Reinforcement-Learning Based Adversarial Scenarios For Safety Hardening

TL;DR

CRASH can control adversarial Non Player Character agents in an AV simulator to automatically induce collisions with the Ego vehicle, falsifying its motion planner, and reduces the Ego vehicle's collision rate by 26%.

Abstract

Ensuring the safety of autonomous vehicles (AVs) requires identifying rare but critical failure cases that on-road testing alone cannot discover. High-fidelity simulations provide a scalable alternative, but automatically generating realistic and diverse traffic scenarios that can effectively stress test AV motion planners remains a key challenge. This paper introduces CRASH - Challenging Reinforcement-learning based Adversarial scenarios for Safety Hardening - an adversarial deep reinforcement learning framework to address this issue. First CRASH can control adversarial Non Player Character (NPC) agents in an AV simulator to automatically induce collisions with the Ego vehicle, falsifying its motion planner. We also propose a novel approach, that we term safety hardening, which iteratively refines the motion planner by simulating improvement scenarios against adversarial agents, leveraging the failure cases to strengthen the AV stack. CRASH is evaluated on a simplified two-lane highway scenario, demonstrating its ability to falsify both rule-based and learning-based planners with collision rates exceeding 90%. Additionally, safety hardening reduces the Ego vehicle's collision rate by 26%. While preliminary, these results highlight RL-based safety hardening as a promising approach for scenario-driven simulation testing for autonomous vehicles.

Paper Structure

This paper contains 19 sections, 8 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: CRASH is a framework that iteratively tests and improves an AV motion planner. First, Automatic Falsification uses adversarial NPCs, guided by a reinforcement learning policy designed to induce collisions with the Ego vehicle. Then, Safety Hardening retrains the Ego motion planner to enhance its robustness against these adversarial scenarios.
  • Figure 2: (a) Coordinate system: the blue vehicle ($V_0$) is the Ego, and the green vehicle ($V_1$) is an NPC on a 2-lane highway. (b) Reward function: positive when the NPC approaches the Ego, negative when moving away. (c) Initial configurations: the NPC (green) spawns at 8 relative positions around the Ego (blue).
  • Figure 3: Automatic Falsification in CRASH: Training architecture for the adversarial NPC agent using Double DQN: simulator states ($S_t$) are input to the value network, which selects actions via the $\epsilon$-greedy method
  • Figure 4: Local Safety Hardening over two cycles: In cycle $c$, the NPC $V_{c-1}$ (green) is trained against the fixed Ego $E_{c-1}$ (blue), producing $V_c$. Then, $V_c$ is fixed, and $E_{c-1}$ is trained to produce $E_c$. This process repeats in cycle $c+1$
  • Figure 5: Uniform Sampling for model pool-based safety hardening: The NPC $V_{c-1}$ is trained against Ego agents sampled uniformly from the pool $E^P_{c-1}$, and the resulting NPC, $V_c$, is added to the NPC pool $V^P_{c}$. The newest Ego is then trained against NPCs sampled uniformly from $V^P_{c}$.
  • ...and 4 more figures