Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
Tong Nie, Yuewen Mei, Yihong Tang, Junlin He, Jie Sun, Haotian Shi, Wei Ma, Jian Sun
TL;DR
The paper tackles safety testing for autonomous driving by generating adversarial scenarios with flexible, at-inference steerability. It reframes adversarial scenario generation as multi-objective preference alignment and introduces SAGE, which uses hierarchical group-based preference optimization (HGPO) to align trajectories offline while decoupling hard feasibility from soft preferences. At test time, SAGE merges two expert policies—one adversarial and one realistic—via weight interpolation to traverse the Pareto front without retraining, with theoretical grounding in Linear Mode Connectivity. Empirically, SAGE achieves superior realism and adversariality balance in open-loop tests and yields improved performance and generalization in closed-loop driving policy training, while ablations validate the core design choices. The framework is applicable across backbone motion models and driving policies, offering a flexible, efficient path to targeted stress testing and robust policy learning.
Abstract
Adversarial scenario generation is a cost-effective approach for safety assessment of autonomous driving systems. However, existing methods are often constrained to a single, fixed trade-off between competing objectives such as adversariality and realism. This yields behavior-specific models that cannot be steered at inference time, lacking the efficiency and flexibility to generate tailored scenarios for diverse training and testing requirements. In view of this, we reframe the task of adversarial scenario generation as a multi-objective preference alignment problem and introduce a new framework named \textbf{S}teerable \textbf{A}dversarial scenario \textbf{GE}nerator (SAGE). SAGE enables fine-grained test-time control over the trade-off between adversariality and realism without any retraining. We first propose hierarchical group-based preference optimization, a data-efficient offline alignment method that learns to balance competing objectives by decoupling hard feasibility constraints from soft preferences. Instead of training a fixed model, SAGE fine-tunes two experts on opposing preferences and constructs a continuous spectrum of policies at inference time by linearly interpolating their weights. We provide theoretical justification for this framework through the lens of linear mode connectivity. Extensive experiments demonstrate that SAGE not only generates scenarios with a superior balance of adversariality and realism but also enables more effective closed-loop training of driving policies. Project page: https://tongnie.github.io/SAGE/.
