When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models

Zhixiang Guo; Siyuan Liang; Andras Balogh; Noah Lunberry; Rong-Cheng Tu; Mark Jelasity; Dacheng Tao

When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models

Zhixiang Guo, Siyuan Liang, Andras Balogh, Noah Lunberry, Rong-Cheng Tu, Mark Jelasity, Dacheng Tao

TL;DR

PhysCond-WMA is the first white-box world model attack that perturbs physical-condition channels, such as HDMap embeddings and 3D-box features, to induce semantic, logic, or decision-level distortion while preserving perceptual fidelity.

Abstract

Generative world models (WMs) are increasingly used to synthesize controllable, sensor-conditioned driving videos, yet their reliance on physical priors exposes novel attack surfaces. In this paper, we present Physical-Conditioned World Model Attack (PhysCond-WMA), the first white-box world model attack that perturbs physical-condition channels, such as HDMap embeddings and 3D-box features, to induce semantic, logic, or decision-level distortion while preserving perceptual fidelity. PhysCond-WMA is optimized in two stages: (1) a quality-preserving guidance stage that constrains reverse-diffusion loss below a calibrated threshold, and (2) a momentum-guided denoising stage that accumulates target-aligned gradients along the denoising trajectory for stable, temporally coherent semantic shifts. Extensive experimental results demonstrate that our approach remains effective while increasing FID by about 9% on average and FVD by about 3.9% on average. Under the targeted attack setting, the attack success rate (ASR) reaches 0.55. Downstream studies further show tangible risk, which using attacked videos for training decreases 3D detection performance by about 4%, and worsens open-loop planning performance by about 20%. These findings has for the first time revealed and quantified security vulnerabilities in generative world models, driving more comprehensive security checkers.

When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models

TL;DR

Abstract

Paper Structure (22 sections, 11 equations, 10 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 11 equations, 10 figures, 5 tables, 1 algorithm.

Introduction
Related work
World Models in Autonomous Driving
Diffusion-Based Adversarial Attacks
Preliminaries
Generative World Model
Problem Definition
Approach
Quality-preserving Guidance Stage
Denoising Optimization Stage
Attack Pipeline and Implementation Details
Experiments
Experimental Setup
Main results
Ablation Study
...and 7 more sections

Figures (10)

Figure 1: Adversarial attack on generative world model. By conducting white-box attacks on the generative world model, we aim to change the semantics of the generated results while preserving quality, thus achieving an adversarial attack that affects downstream tasks
Figure 2: Overall framework of PhysCond-WMA.
Figure 3: Visualization of untargeted and targeted PhysCond-WMA
Figure 4: Visualization of PhysCond-WMA. GPT-5 determines attack failure, while human determines attack success.
Figure 5: Ablation study on loss function threshold $\tau$
...and 5 more figures

When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models

TL;DR

Abstract

When World Models Dream Wrong: Physical-Conditioned Adversarial Attacks against World Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)