Table of Contents
Fetching ...

Boosting Imperceptibility of Stable Diffusion-based Adversarial Examples Generation with Momentum

Nashrah Haque, Xiang Li, Zhehui Chen, Yanzhao Wu, Lei Yu, Arun Iyengar, Wenqi Wei

TL;DR

Experimental results demonstrate that SD-MIAE achieves a high misclassification rate of 79%, improving by 35% over the state-of-the-art method while preserving the imperceptibility of adversarial perturbations and the semantic similarity to the original class label, making it a practical method for robust adversarial evaluation.

Abstract

We propose a novel framework, Stable Diffusion-based Momentum Integrated Adversarial Examples (SD-MIAE), for generating adversarial examples that can effectively mislead neural network classifiers while maintaining visual imperceptibility and preserving the semantic similarity to the original class label. Our method leverages the text-to-image generation capabilities of the Stable Diffusion model by manipulating token embeddings corresponding to the specified class in its latent space. These token embeddings guide the generation of adversarial images that maintain high visual fidelity. The SD-MIAE framework consists of two phases: (1) an initial adversarial optimization phase that modifies token embeddings to produce misclassified yet natural-looking images and (2) a momentum-based optimization phase that refines the adversarial perturbations. By introducing momentum, our approach stabilizes the optimization of perturbations across iterations, enhancing both the misclassification rate and visual fidelity of the generated adversarial examples. Experimental results demonstrate that SD-MIAE achieves a high misclassification rate of 79%, improving by 35% over the state-of-the-art method while preserving the imperceptibility of adversarial perturbations and the semantic similarity to the original class label, making it a practical method for robust adversarial evaluation.

Boosting Imperceptibility of Stable Diffusion-based Adversarial Examples Generation with Momentum

TL;DR

Experimental results demonstrate that SD-MIAE achieves a high misclassification rate of 79%, improving by 35% over the state-of-the-art method while preserving the imperceptibility of adversarial perturbations and the semantic similarity to the original class label, making it a practical method for robust adversarial evaluation.

Abstract

We propose a novel framework, Stable Diffusion-based Momentum Integrated Adversarial Examples (SD-MIAE), for generating adversarial examples that can effectively mislead neural network classifiers while maintaining visual imperceptibility and preserving the semantic similarity to the original class label. Our method leverages the text-to-image generation capabilities of the Stable Diffusion model by manipulating token embeddings corresponding to the specified class in its latent space. These token embeddings guide the generation of adversarial images that maintain high visual fidelity. The SD-MIAE framework consists of two phases: (1) an initial adversarial optimization phase that modifies token embeddings to produce misclassified yet natural-looking images and (2) a momentum-based optimization phase that refines the adversarial perturbations. By introducing momentum, our approach stabilizes the optimization of perturbations across iterations, enhancing both the misclassification rate and visual fidelity of the generated adversarial examples. Experimental results demonstrate that SD-MIAE achieves a high misclassification rate of 79%, improving by 35% over the state-of-the-art method while preserving the imperceptibility of adversarial perturbations and the semantic similarity to the original class label, making it a practical method for robust adversarial evaluation.

Paper Structure

This paper contains 22 sections, 5 equations, 12 figures, 1 algorithm.

Figures (12)

  • Figure 1: Qualitative comparison of adversarial examples generated by SD-NAE and SD-MIAE. As iteration steps increase, the semantic contents of adversarial examples generated by SD-NAE gradually deviate from their original class label, becoming easily detectable by human observers. In contrast, SD-MIAE produces adversarial examples that remain visually imperceptible and semantically aligned with the original class label.
  • Figure 2: SD-MIAE workflow. Key steps include loading the models, generating text embeddings, performing forward diffusion, applying the SD-MIAE adversarial attack, classifying the image, and optimizing the token embeddings through backpropagation. The process is iteratively repeated to refine the adversarial example until misclassification is achieved.
  • Figure 3: Comparison of the misclassification rates across 100 images from 10 classes on the benign setup (4%), SD-NAE setup (44%), SD-MIAE setup (79%).
  • Figure 4: Visualization of the effects of the SD-MIAE attack on a benign image. The left image shows the original image classified as a Flamingo. The middle image depicts the adversarial perturbations optimized by our momentum-based refinement technique. The right image displays the adversarial image misclassified as a Spoonbill with a probability of 99%.
  • Figure 5: Impact of increasing epsilon using the best $\mu$, 1.0, on the misclassification rate.
  • ...and 7 more figures