Table of Contents
Fetching ...

Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability

Haotian Xue, Alexandre Araujo, Bin Hu, Yongxin Chen

TL;DR

Diff-PGD introduces a diffusion-model-guided PGD framework that keeps adversarial examples within the natural image distribution by optimizing against a purified input $x_0$ produced via diffusion editing. By decoupling adversarial loss from realism constraints, it supports digital, region-based, style-guided, and physical-world attacks with improved stealthiness, transferability, and anti-purification properties. The approach leverages DDIM-based speedups and SDEdit to bridge between input and real data distributions, and it extends to practical variants like Diff-rPGD, Diff-PGD with style prompts, and Diff-Phys for robust physical patches. An acceleration variant reduces computational burden with a gradient-approximation, enabling feasible adversarial generation on large-scale data. Overall, Diff-PGD advances the realism and controllability of adversarial samples, informing both attack and defense research toward robust AI systems.

Abstract

Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this paper, we propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. By exploiting a gradient guided by a diffusion model, Diff-PGD ensures that adversarial samples remain close to the original data distribution while maintaining their effectiveness. Moreover, our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks. Compared with existing methods for generating natural-style adversarial samples, our framework enables the separation of optimizing adversarial loss from other surrogate losses (e.g., content/smoothness/style loss), making it more stable and controllable. Finally, we demonstrate that the samples generated using Diff-PGD have better transferability and anti-purification power than traditional gradient-based methods. Code will be released in https://github.com/xavihart/Diff-PGD

Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability

TL;DR

Diff-PGD introduces a diffusion-model-guided PGD framework that keeps adversarial examples within the natural image distribution by optimizing against a purified input produced via diffusion editing. By decoupling adversarial loss from realism constraints, it supports digital, region-based, style-guided, and physical-world attacks with improved stealthiness, transferability, and anti-purification properties. The approach leverages DDIM-based speedups and SDEdit to bridge between input and real data distributions, and it extends to practical variants like Diff-rPGD, Diff-PGD with style prompts, and Diff-Phys for robust physical patches. An acceleration variant reduces computational burden with a gradient-approximation, enabling feasible adversarial generation on large-scale data. Overall, Diff-PGD advances the realism and controllability of adversarial samples, informing both attack and defense research toward robust AI systems.

Abstract

Neural networks are known to be susceptible to adversarial samples: small variations of natural examples crafted to deliberately mislead the models. While they can be easily generated using gradient-based techniques in digital and physical scenarios, they often differ greatly from the actual data distribution of natural images, resulting in a trade-off between strength and stealthiness. In this paper, we propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. By exploiting a gradient guided by a diffusion model, Diff-PGD ensures that adversarial samples remain close to the original data distribution while maintaining their effectiveness. Moreover, our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks. Compared with existing methods for generating natural-style adversarial samples, our framework enables the separation of optimizing adversarial loss from other surrogate losses (e.g., content/smoothness/style loss), making it more stable and controllable. Finally, we demonstrate that the samples generated using Diff-PGD have better transferability and anti-purification power than traditional gradient-based methods. Code will be released in https://github.com/xavihart/Diff-PGD
Paper Structure (35 sections, 11 equations, 21 figures, 5 tables, 4 algorithms)

This paper contains 35 sections, 11 equations, 21 figures, 5 tables, 4 algorithms.

Figures (21)

  • Figure 1: Comparison of Different Pipelines: (a) Traditional gradient-based adversarial sample generation, $x$ is the sample to be optimized, $l_{adv}$ is adversarial loss. (b) Customized adversarial sample generation with natural style (determined by prompt $p$): joint optimization of adversarial loss with other surrogate losses like prompt loss $l_p$ (e.g. style loss) and realistic loss $l_r$ (e.g. content loss, smooth loss). (c) Our proposed diffusion-based framework, $q$ is forward diffusion and $R_{\phi}$ is backward denoising, $x_0$ is the denoised sample.
  • Figure 2: Visualization of Adversarial Samples generated by Diff-PGD: adv-samples generated using PGD ($x_{\text{PGD}}$) tend to be unnatural, while Diff-PGD ($x^n_0$) can preserve the authenticity of adv-samples. Here $x$ is the original image, $\delta_{\text{PGD}}=x-x_{\text{PGD}}$ and $\delta^n_0=x-x^n_0$, and we scale up the $\delta$ value by five times for better observation. Zoom in on a computer screen for better visualization.
  • Figure 3: Visualization of Adversarial Samples generated by Diff-rPGD: Diff-rPGD can generate better regional attacks than PGD: the attacked region can better blend into the background. The attacked regions are defined using red bounding boxes, and $(+)$ means zoom-in.
  • Figure 4: Generating Adversarial Samples with Customized Style: Given the original image $x$, a style mask $M$, and a style reference image $x_s$, Diff-PGD can generate more realistic samples, even in cases where only local styles are given (e.g. only the door of the red car is offered as a $x_s$).
  • Figure 5: Results of Physical-World Attacks: We show two scenarios of physical world attacks: the first row includes untargeted attacks on a small object: computer mouse, and the second row includes targeted attacks on a larger object: backpack, where we set the target to be Yorkshire terrier. The sticks-photo pairs include clean patch (green box), AdvPatch(blue box), AdvCam generated patch (red box), and our Diff-Phys generated patch (black box).
  • ...and 16 more figures