Table of Contents
Fetching ...

Steerable Adversarial Scenario Generation through Test-Time Preference Alignment

Tong Nie, Yuewen Mei, Yihong Tang, Junlin He, Jie Sun, Haotian Shi, Wei Ma, Jian Sun

TL;DR

The paper tackles safety testing for autonomous driving by generating adversarial scenarios with flexible, at-inference steerability. It reframes adversarial scenario generation as multi-objective preference alignment and introduces SAGE, which uses hierarchical group-based preference optimization (HGPO) to align trajectories offline while decoupling hard feasibility from soft preferences. At test time, SAGE merges two expert policies—one adversarial and one realistic—via weight interpolation to traverse the Pareto front without retraining, with theoretical grounding in Linear Mode Connectivity. Empirically, SAGE achieves superior realism and adversariality balance in open-loop tests and yields improved performance and generalization in closed-loop driving policy training, while ablations validate the core design choices. The framework is applicable across backbone motion models and driving policies, offering a flexible, efficient path to targeted stress testing and robust policy learning.

Abstract

Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems. However, existing methods are often constrained to a single, fixed trade-off between competing objectives such as adversariality and realism. This yields behavior-specific models that cannot be steered at inference time, lacking the efficiency and flexibility to generate tailored scenarios for diverse training and testing requirements. In view of this, we reframe the task of adversarial scenario generation as a multi-objective preference alignment problem and introduce a new framework named \textbf{S}teerable \textbf{A}dversarial scenario \textbf{GE}nerator (SAGE). SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining. We first propose hierarchical group-based preference optimization, a data-efficient offline alignment method that learns to balance competing objectives by decoupling hard feasibility constraints from soft preferences. Instead of training a fixed model, SAGE fine-tunes two experts on opposing preferences and constructs a continuous spectrum of policies at inference time by linearly interpolating their weights. We provide theoretical justification for this framework through the lens of linear mode connectivity. Extensive experiments demonstrate that SAGE not only generates scenarios with a superior balance of adversariality and realism but also enables more effective closed-loop training of driving policies. Project page: https://tongnie.github.io/SAGE/.

Steerable Adversarial Scenario Generation through Test-Time Preference Alignment

TL;DR

The paper tackles safety testing for autonomous driving by generating adversarial scenarios with flexible, at-inference steerability. It reframes adversarial scenario generation as multi-objective preference alignment and introduces SAGE, which uses hierarchical group-based preference optimization (HGPO) to align trajectories offline while decoupling hard feasibility from soft preferences. At test time, SAGE merges two expert policies—one adversarial and one realistic—via weight interpolation to traverse the Pareto front without retraining, with theoretical grounding in Linear Mode Connectivity. Empirically, SAGE achieves superior realism and adversariality balance in open-loop tests and yields improved performance and generalization in closed-loop driving policy training, while ablations validate the core design choices. The framework is applicable across backbone motion models and driving policies, offering a flexible, efficient path to targeted stress testing and robust policy learning.

Abstract

Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems. However, existing methods are often constrained to a single, fixed trade-off between competing objectives such as adversariality and realism. This yields behavior-specific models that cannot be steered at inference time, lacking the efficiency and flexibility to generate tailored scenarios for diverse training and testing requirements. In view of this, we reframe the task of adversarial scenario generation as a multi-objective preference alignment problem and introduce a new framework named \textbf{S}teerable \textbf{A}dversarial scenario \textbf{GE}nerator (SAGE). SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining. We first propose hierarchical group-based preference optimization, a data-efficient offline alignment method that learns to balance competing objectives by decoupling hard feasibility constraints from soft preferences. Instead of training a fixed model, SAGE fine-tunes two experts on opposing preferences and constructs a continuous spectrum of policies at inference time by linearly interpolating their weights. We provide theoretical justification for this framework through the lens of linear mode connectivity. Extensive experiments demonstrate that SAGE not only generates scenarios with a superior balance of adversariality and realism but also enables more effective closed-loop training of driving policies. Project page: https://tongnie.github.io/SAGE/.

Paper Structure

This paper contains 87 sections, 4 theorems, 63 equations, 15 figures, 11 tables, 2 algorithms.

Key Result

Theorem 1

Let the base reward functions $R_{\text{adv}}(\theta)$ and $R_{\text{real}}(\theta)$ be $L$-smooth and $m$-strongly concave in the local region of the fine-tuned optima. Let $\theta_1$ and $\theta_2$ be the optimal parameters for the two expert models (e.g., Eq. eq:expert model) trained with mixing where $C(\mu, \beta, L, m)$ is a constant dependent on the user preference $\mu$, the expert weight

Figures (15)

  • Figure 1: Limitation of existing adversarial generation methods, our solution, and its application.
  • Figure 2: Behavioral realism comparison. Adversarial generation against the Replay policy.
  • Figure 3: SAGE generates diverse types of meaningful adversarial behaviors (Replay policy).
  • Figure 4: Pareto front and continuous performance transition at test time. (a) We compare the trade-off curves for different model merging strategies in terms of their adversariality and realism. (b) SAGE achieves smooth and continuous outcome control by varying the adversarial weight.
  • Figure 5: More aggressive behaviors are generated from SAGE by increasing $w_\text{adv}$ from 0 to 1 (see Appendix \ref{['app:more-controllable-case']} for more examples). Adversarial generation against the Replay policy.
  • ...and 10 more figures

Theorems & Definitions (7)

  • Definition 1: Perturbation-based Adversarial Optimization
  • Theorem 1: Suboptimality of Weight Interpolation
  • Proposition 1: Advantage of Weight Mixing over Output Ensembling
  • Theorem 2: Suboptimality Gap for Quadratic Rewards
  • Lemma 1
  • proof
  • Remark 1: Remark on the Dominance of Loss Curvature Benefit