Table of Contents
Fetching ...

Challenger: Affordable Adversarial Driving Video Generation

Zhiyuan Xu, Bohan Li, Huan-ang Gao, Mingju Gao, Yong Chen, Ming Liu, Chenxu Yan, Hang Zhao, Shuo Feng, Hao Zhao

TL;DR

Challenger introduces a unified framework for generating photorealistic adversarial driving videos by coupling a diffusion-based trajectory generator with physics-aware planning and multiview neural rendering. The approach yields diverse, physically plausible adversarial maneuvers and renders them into six-camera videos, enabling stress-testing of state-of-the-art end-to-end autonomous driving models. Experiments on Adv-nuSc show substantial increases in collision rates across multiple models and demonstrate transferability of adversarial behaviors, highlighting shared vulnerabilities. The work provides a scalable tool for robustness evaluation while discussing limitations and directions for extending to closed-loop settings and additional modalities.

Abstract

Generating photorealistic driving videos has seen significant progress recently, but current methods largely focus on ordinary, non-adversarial scenarios. Meanwhile, efforts to generate adversarial driving scenarios often operate on abstract trajectory or BEV representations, falling short of delivering realistic sensor data that can truly stress-test autonomous driving (AD) systems. In this work, we introduce Challenger, a framework that produces physically plausible yet photorealistic adversarial driving videos. Generating such videos poses a fundamental challenge: it requires jointly optimizing over the space of traffic interactions and high-fidelity sensor observations. Challenger makes this affordable through two techniques: (1) a physics-aware multi-round trajectory refinement process that narrows down candidate adversarial maneuvers, and (2) a tailored trajectory scoring function that encourages realistic yet adversarial behavior while maintaining compatibility with downstream video synthesis. As tested on the nuScenes dataset, Challenger generates a diverse range of aggressive driving scenarios-including cut-ins, sudden lane changes, tailgating, and blind spot intrusions-and renders them into multiview photorealistic videos. Extensive evaluations show that these scenarios significantly increase the collision rate of state-of-the-art end-to-end AD models (UniAD, VAD, SparseDrive, and DiffusionDrive), and importantly, adversarial behaviors discovered for one model often transfer to others.

Challenger: Affordable Adversarial Driving Video Generation

TL;DR

Challenger introduces a unified framework for generating photorealistic adversarial driving videos by coupling a diffusion-based trajectory generator with physics-aware planning and multiview neural rendering. The approach yields diverse, physically plausible adversarial maneuvers and renders them into six-camera videos, enabling stress-testing of state-of-the-art end-to-end autonomous driving models. Experiments on Adv-nuSc show substantial increases in collision rates across multiple models and demonstrate transferability of adversarial behaviors, highlighting shared vulnerabilities. The work provides a scalable tool for robustness evaluation while discussing limitations and directions for extending to closed-loop settings and additional modalities.

Abstract

Generating photorealistic driving videos has seen significant progress recently, but current methods largely focus on ordinary, non-adversarial scenarios. Meanwhile, efforts to generate adversarial driving scenarios often operate on abstract trajectory or BEV representations, falling short of delivering realistic sensor data that can truly stress-test autonomous driving (AD) systems. In this work, we introduce Challenger, a framework that produces physically plausible yet photorealistic adversarial driving videos. Generating such videos poses a fundamental challenge: it requires jointly optimizing over the space of traffic interactions and high-fidelity sensor observations. Challenger makes this affordable through two techniques: (1) a physics-aware multi-round trajectory refinement process that narrows down candidate adversarial maneuvers, and (2) a tailored trajectory scoring function that encourages realistic yet adversarial behavior while maintaining compatibility with downstream video synthesis. As tested on the nuScenes dataset, Challenger generates a diverse range of aggressive driving scenarios-including cut-ins, sudden lane changes, tailgating, and blind spot intrusions-and renders them into multiview photorealistic videos. Extensive evaluations show that these scenarios significantly increase the collision rate of state-of-the-art end-to-end AD models (UniAD, VAD, SparseDrive, and DiffusionDrive), and importantly, adversarial behaviors discovered for one model often transfer to others.

Paper Structure

This paper contains 29 sections, 5 equations, 17 figures, 5 tables, 1 algorithm.

Figures (17)

  • Figure 1: Photorealistic adversarial driving videos generated by Challenger. Each scenario includes an adversarial vehicle—highlighted with a white 3D bounding box in camera views and depicted as a red rectangle in the bird's-eye view (BEV) map—that is intentionally designed to challenge the ego vehicle through aggressive or unexpected maneuvers. Challenger autonomously produces these videos using a diffusion-based trajectory generator, physics-aware planning, and multiview neural rendering, with an affordable computation budget. Readers are suggested to zoom in on BEV maps and camera views for detailed inspection. Videos are available at our project page
  • Figure 2: Challenger Overview.Challenger first ingests 3D bounding boxes and BEV road maps from a real-world dataset (e.g., nuScenes) to initialize a driving scene. It then randomly selects a background vehicle to act as the adversarial agent, while leaving all other participants unchanged. At fixed keyframes, Challenger plans the adversarial vehicle’s trajectory and executes it continuously between keyframes. At each planning keyframe, a multi-round refinement process is used to explore the trajectory space efficiently and generate an adversarial one. In the first round, a batch of candidate trajectories $\tau \in R^{B \times T \times 2}$ is sampled from a diffusion-based generator. To ensure physical feasibility, a physics-aware planning simulator with an LQR controller and a kinematic model is used to simulate the motion of the adversarial vehicle. The simulated trajectories are then scored, and the top-performing trajectories are resampled (with replacement), perturbed by noise, denoised via the diffusion model, and fed back into the planning simulator for subsequent rounds. This iterative process continues for a fixed number of rounds in order to progressively narrow down the trajectory space in an efficient manner, after which the best simulated trajectory is selected as the final adversarial driving plan for that keyframe. The generated trajectory modifies the original driving scene, and this process is repeated across all keyframes. Finally, a multiview neural renderer synthesizes photorealistic video outputs of the final adversarial scenario.
  • Figure 2: Ablation study. We show the effectiveness of the multi-round trajectory refinement process (MTR) and trajectory scoring process (TS).
  • Figure 3: Failure cases of an E2E AD model in adversarial scenarios generated by Challenger.
  • Figure 4: Quantitative evaluation of video quality. We report the subject consistency (SC) and imaging quality (IQ) of the datasets.
  • ...and 12 more figures