SemDiff: Generating Natural Unrestricted Adversarial Examples via Semantic Attributes Optimization in Diffusion Models

Zeyu Dai; Shengcai Liu; Rui He; Jiahao Wu; Ning Lu; Wenqi Fan; Qing Li; Ke Tang

SemDiff: Generating Natural Unrestricted Adversarial Examples via Semantic Attributes Optimization in Diffusion Models

Zeyu Dai, Shengcai Liu, Rui He, Jiahao Wu, Ning Lu, Wenqi Fan, Qing Li, Ke Tang

TL;DR

SemDiff introduces a semantic-latent-space UAE attack for diffusion models, addressing the naturalness gap left by prior UAE methods that perturb intermediate latents. By learning adversarial semantic attributes with CLIP guidance and optimizing multiple attributes via a weight-penalized objective, SemDiff generates UAEs with meaningful attribute changes that remain perceptually realistic. Across CelebA-HQ, AFHQ, and ImageNet, SemDiff achieves high attack success while outperforming baselines on BRISQUE, FID, and KID, and shows robustness to multiple defenses. This work highlights a vulnerability in current defenses to semantically guided UAE generation and motivates development of stronger robustness techniques against unrestricted adversarial threats.

Abstract

Unrestricted adversarial examples (UAEs), allow the attacker to create non-constrained adversarial examples without given clean samples, posing a severe threat to the safety of deep learning models. Recent works utilize diffusion models to generate UAEs. However, these UAEs often lack naturalness and imperceptibility due to simply optimizing in intermediate latent noises. In light of this, we propose SemDiff, a novel unrestricted adversarial attack that explores the semantic latent space of diffusion models for meaningful attributes, and devises a multi-attributes optimization approach to ensure attack success while maintaining the naturalness and imperceptibility of generated UAEs. We perform extensive experiments on four tasks on three high-resolution datasets, including CelebA-HQ, AFHQ and ImageNet. The results demonstrate that SemDiff outperforms state-of-the-art methods in terms of attack success rate and imperceptibility. The generated UAEs are natural and exhibit semantically meaningful changes, in accord with the attributes' weights. In addition, SemDiff is found capable of evading different defenses, which further validates its effectiveness and threatening.

SemDiff: Generating Natural Unrestricted Adversarial Examples via Semantic Attributes Optimization in Diffusion Models

TL;DR

Abstract

SemDiff: Generating Natural Unrestricted Adversarial Examples via Semantic Attributes Optimization in Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)