TRAIL: Transferable Robust Adversarial Images via Latent diffusion

Yuhao Xue; Zhifei Zhang; Xinyang Jiang; Yifei Shen; Junyao Gao; Wentao Gu; Jiale Zhao; Miaojing Shi; Cairong Zhao

TRAIL: Transferable Robust Adversarial Images via Latent diffusion

Yuhao Xue, Zhifei Zhang, Xinyang Jiang, Yifei Shen, Junyao Gao, Wentao Gu, Jiale Zhao, Miaojing Shi, Cairong Zhao

TL;DR

TRAIL tackles the transferability gap in unrestricted adversarial attacks by performing test-time adaptation of a latent diffusion model to synthesize perturbations that follow the real-world image distribution $p(x)$ while embedding robust features $p(x+\delta)$. It optimizes an adversarial loss $\mathcal{L}_{adv}$ together with a distance loss $\mathcal{L}_{dis}$ during diffusion denoising, uses gradient guidance $\mathcal{G}_t$ to steer generation, and employs a one-step backpropagation for efficiency. Empirically, TRAIL delivers superior cross-model transferability across CNNs and ViTs, bypasses common defenses, and even enables black-box attacks on vision-language models, with a theoretical proposition bounding latent perturbations under diffusion dynamics. The work highlights distribution-aligned adversarial feature synthesis as crucial for practical black-box attacks and introduces a new attack paradigm with potential security implications.

Abstract

Adversarial attacks exploiting unrestricted natural perturbations present severe security risks to deep learning systems, yet their transferability across models remains limited due to distribution mismatches between generated adversarial features and real-world data. While recent works utilize pre-trained diffusion models as adversarial priors, they still encounter challenges due to the distribution shift between the distribution of ideal adversarial samples and the natural image distribution learned by the diffusion model. To address the challenge, we propose Transferable Robust Adversarial Images via Latent Diffusion (TRAIL), a test-time adaptation framework that enables the model to generate images from a distribution of images with adversarial features and closely resembles the target images. To mitigate the distribution shift, during attacks, TRAIL updates the diffusion U-Net's weights by combining adversarial objectives (to mislead victim models) and perceptual constraints (to preserve image realism). The adapted model then generates adversarial samples through iterative noise injection and denoising guided by these objectives. Experiments demonstrate that TRAIL significantly outperforms state-of-the-art methods in cross-model attack transferability, validating that distribution-aligned adversarial feature synthesis is critical for practical black-box attacks.

TRAIL: Transferable Robust Adversarial Images via Latent diffusion

TL;DR

Abstract

TRAIL: Transferable Robust Adversarial Images via Latent diffusion

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)