Table of Contents
Fetching ...

DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

Mintong Kang, Dawn Song, Bo Li

TL;DR

This paper addresses the vulnerability of diffusion-based adversarial purification defenses, which use DDPMs or score-based models to cleanse inputs, by introducing DiffAttack. DiffAttack combines a deviated-reconstruction loss to induce inaccurate density-gradient estimates with a segment-wise forwarding-backwarding algorithm to enable memory-efficient backpropagation through the long diffusion paths. Empirically, it substantially reduces robust accuracy on CIFAR-10 and ImageNet compared with state-of-the-art attacks across both purification paradigms, and it demonstrates constant-memory scaling with diffusion length. The work provides insights into the robustness of diffusion purification and motivates developing more robust sampling and defense mechanisms that address gradient reliability and memory constraints in adversarial settings.

Abstract

Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples and achieve state-of-the-art robustness. Recent studies show that even advanced attacks cannot break such defenses effectively, since the purification process induces an extremely deep computational graph which poses the potential problem of gradient obfuscation, high memory cost, and unbounded randomness. In this paper, we propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses, including both DDPM and score-based approaches. In particular, we propose a deviated-reconstruction loss at intermediate diffusion steps to induce inaccurate density gradient estimation to tackle the problem of vanishing/exploding gradients. We also provide a segment-wise forwarding-backwarding algorithm, which leads to memory-efficient gradient backpropagation. We validate the attack effectiveness of DiffAttack compared with existing adaptive attacks on CIFAR-10 and ImageNet. We show that DiffAttack decreases the robust accuracy of models compared with SOTA attacks by over 20% on CIFAR-10 under $\ell_\infty$ attack $(ε=8/255)$, and over 10% on ImageNet under $\ell_\infty$ attack $(ε=4/255)$. We conduct a series of ablations studies, and we find 1) DiffAttack with the deviated-reconstruction loss added over uniformly sampled time steps is more effective than that added over only initial/final steps, and 2) diffusion-based purification with a moderate diffusion length is more robust under DiffAttack.

DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial Purification

TL;DR

This paper addresses the vulnerability of diffusion-based adversarial purification defenses, which use DDPMs or score-based models to cleanse inputs, by introducing DiffAttack. DiffAttack combines a deviated-reconstruction loss to induce inaccurate density-gradient estimates with a segment-wise forwarding-backwarding algorithm to enable memory-efficient backpropagation through the long diffusion paths. Empirically, it substantially reduces robust accuracy on CIFAR-10 and ImageNet compared with state-of-the-art attacks across both purification paradigms, and it demonstrates constant-memory scaling with diffusion length. The work provides insights into the robustness of diffusion purification and motivates developing more robust sampling and defense mechanisms that address gradient reliability and memory constraints in adversarial settings.

Abstract

Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples and achieve state-of-the-art robustness. Recent studies show that even advanced attacks cannot break such defenses effectively, since the purification process induces an extremely deep computational graph which poses the potential problem of gradient obfuscation, high memory cost, and unbounded randomness. In this paper, we propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses, including both DDPM and score-based approaches. In particular, we propose a deviated-reconstruction loss at intermediate diffusion steps to induce inaccurate density gradient estimation to tackle the problem of vanishing/exploding gradients. We also provide a segment-wise forwarding-backwarding algorithm, which leads to memory-efficient gradient backpropagation. We validate the attack effectiveness of DiffAttack compared with existing adaptive attacks on CIFAR-10 and ImageNet. We show that DiffAttack decreases the robust accuracy of models compared with SOTA attacks by over 20% on CIFAR-10 under attack , and over 10% on ImageNet under attack . We conduct a series of ablations studies, and we find 1) DiffAttack with the deviated-reconstruction loss added over uniformly sampled time steps is more effective than that added over only initial/final steps, and 2) diffusion-based purification with a moderate diffusion length is more robust under DiffAttack.
Paper Structure (27 sections, 4 theorems, 42 equations, 9 figures, 8 tables, 2 algorithms)

This paper contains 27 sections, 4 theorems, 42 equations, 9 figures, 8 tables, 2 algorithms.

Key Result

Theorem 1

Consider adversarial sample $\Tilde{{\mathbf{x}}}_0:={\mathbf{x}}_0+\delta$, where ${\mathbf{x}}_0$ is the clean example and $\delta$ is the perturbation. $p_t({\mathbf{x}})$,$p'_t({\mathbf{x}})$,$q_t({\mathbf{x}})$,$q'_t({\mathbf{x}})$ are the distribution of ${\mathbf{x}}_t$,${\mathbf{x}}'_t$,$\Ti $C_1 = (L_u+8M^2) \int_t^T \beta(t) dt$, $C_2 = (8 (1-\Pi_{s=1}^t (1-\beta_s)))^{-1}$.

Figures (9)

  • Figure 1: DiffAttack against diffusion-based adversarial purification defenses. DiffAttack features the deviated-reconstruction loss that addresses vanishing/exploding gradients and the segment-wise forwarding-backwarding algorithm that leads to memory-efficient gradient backpropagation.
  • Figure 2: The clean/robust accuracy (%) of diffusion-based purification with different diffusion length $T$ under DiffAttack on CIFAR-10 with WideResNet-28-10 under $\ell_\infty$ attack $(\epsilon=8/255)$.
  • Figure 3: Comparison of memory cost of gradient backpropagation between blau2022threat and DiffAttack with batch size $16$ on CIFAR-10 with WideResNet-28-10 under $\ell_\infty$ attack.
  • Figure 4: The impact of applying ${\mathcal{L}}_{dev}$ at different time steps on decreased robust accuracy (%). $T$ is the diffusion length and $\text{Uni}(0,T)$ represents uniform sampling.
  • Figure 5: Visualization of the clean images and adversarial samples generated by DiffAttack on CIFAR-10 with $\ell_\infty$ attack ($\epsilon=8/255$) against score-based purification with WideResNet-28-10.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Remark
  • Lemma C.1
  • proof
  • Theorem 2: \ref{['thm_main']} in the main text
  • proof
  • Theorem 3
  • proof