Table of Contents
Fetching ...

SemDiff: Generating Natural Unrestricted Adversarial Examples via Semantic Attributes Optimization in Diffusion Models

Zeyu Dai, Shengcai Liu, Rui He, Jiahao Wu, Ning Lu, Wenqi Fan, Qing Li, Ke Tang

TL;DR

SemDiff introduces a semantic-latent-space UAE attack for diffusion models, addressing the naturalness gap left by prior UAE methods that perturb intermediate latents. By learning adversarial semantic attributes with CLIP guidance and optimizing multiple attributes via a weight-penalized objective, SemDiff generates UAEs with meaningful attribute changes that remain perceptually realistic. Across CelebA-HQ, AFHQ, and ImageNet, SemDiff achieves high attack success while outperforming baselines on BRISQUE, FID, and KID, and shows robustness to multiple defenses. This work highlights a vulnerability in current defenses to semantically guided UAE generation and motivates development of stronger robustness techniques against unrestricted adversarial threats.

Abstract

Unrestricted adversarial examples (UAEs), allow the attacker to create non-constrained adversarial examples without given clean samples, posing a severe threat to the safety of deep learning models. Recent works utilize diffusion models to generate UAEs. However, these UAEs often lack naturalness and imperceptibility due to simply optimizing in intermediate latent noises. In light of this, we propose SemDiff, a novel unrestricted adversarial attack that explores the semantic latent space of diffusion models for meaningful attributes, and devises a multi-attributes optimization approach to ensure attack success while maintaining the naturalness and imperceptibility of generated UAEs. We perform extensive experiments on four tasks on three high-resolution datasets, including CelebA-HQ, AFHQ and ImageNet. The results demonstrate that SemDiff outperforms state-of-the-art methods in terms of attack success rate and imperceptibility. The generated UAEs are natural and exhibit semantically meaningful changes, in accord with the attributes' weights. In addition, SemDiff is found capable of evading different defenses, which further validates its effectiveness and threatening.

SemDiff: Generating Natural Unrestricted Adversarial Examples via Semantic Attributes Optimization in Diffusion Models

TL;DR

SemDiff introduces a semantic-latent-space UAE attack for diffusion models, addressing the naturalness gap left by prior UAE methods that perturb intermediate latents. By learning adversarial semantic attributes with CLIP guidance and optimizing multiple attributes via a weight-penalized objective, SemDiff generates UAEs with meaningful attribute changes that remain perceptually realistic. Across CelebA-HQ, AFHQ, and ImageNet, SemDiff achieves high attack success while outperforming baselines on BRISQUE, FID, and KID, and shows robustness to multiple defenses. This work highlights a vulnerability in current defenses to semantically guided UAE generation and motivates development of stronger robustness techniques against unrestricted adversarial threats.

Abstract

Unrestricted adversarial examples (UAEs), allow the attacker to create non-constrained adversarial examples without given clean samples, posing a severe threat to the safety of deep learning models. Recent works utilize diffusion models to generate UAEs. However, these UAEs often lack naturalness and imperceptibility due to simply optimizing in intermediate latent noises. In light of this, we propose SemDiff, a novel unrestricted adversarial attack that explores the semantic latent space of diffusion models for meaningful attributes, and devises a multi-attributes optimization approach to ensure attack success while maintaining the naturalness and imperceptibility of generated UAEs. We perform extensive experiments on four tasks on three high-resolution datasets, including CelebA-HQ, AFHQ and ImageNet. The results demonstrate that SemDiff outperforms state-of-the-art methods in terms of attack success rate and imperceptibility. The generated UAEs are natural and exhibit semantically meaningful changes, in accord with the attributes' weights. In addition, SemDiff is found capable of evading different defenses, which further validates its effectiveness and threatening.

Paper Structure

This paper contains 33 sections, 14 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: Some UAEs generated by the proposed SemDiff. Note that the sampled images are not real-world images, but are generated by diffusion models with randomly sampled noise images. Our SemDiff optimizes the weights of multiple meaningful attributes to craft adversarial examples based on the sampled images. It is shown that the weights are in accord with the semantic changes in images, even the negative weight can lead to an opposite but meaningful change.
  • Figure 2: Overview of the proposed SemDiff. The yellow box illustrates that SemDiff only modifies $\mathbf{P}_{t}$ ( blue lines) while preserving $\mathbf{D}_{t}$ (black lines) in the reverse process of DDIM to alter the semantics of the generated UAEs. SemDiff first train a semantic function $\mathbf{F}_{i}(\boldsymbol{h}_{t}, t)$ to learn each adversarial semantic attribute with the loss shown in the blue box. Then, the weights of multiple attributes $\boldsymbol{w}_{it}$ are optimized with the objective exhibited in the green box. $\boldsymbol{f}$ is the target classifier to be attacked, $\boldsymbol{g}$ is another auxiliary classifier to maintain the class appearance. Notice that we use clearer $\mathbf{P}_{t}$ (within the red border) instead of $\boldsymbol{x}_t$ at each timestep in both function training and weight optimization.
  • Figure 3: Visual examples of the UAEs generated by different attacks for gender classification task on CelebA-HQ. The first column shows the sampled images reconstructed with the randomly sampled noise images by diffusion models.
  • Figure 4: Visual examples of the UAEs generated by different attacks for animal classification task on AFHQ. The first column shows the sampled images reconstructed with the randomly sampled noise images by diffusion models.
  • Figure 5: Hyperparameter analysis of $\lambda_{1}$ and $\lambda_{2}$ in SemDiff. The results are generated on CelebA-HQ dataset against ResNet50 model. We adopt the ASR, Weight, BRISQUE, FID and KID to measure the impact on attack effectiveness and generation quality.
  • ...and 1 more figures