Table of Contents
Fetching ...

LoRID: Low-Rank Iterative Diffusion for Adversarial Purification

Geigh Zollicoffer, Minh Vu, Ben Nebgen, Juan Castorena, Boian Alexandrov, Manish Bhattarai

TL;DR

LoRID is introduced, a novel Low-Rank Iterative Diffusion purification method designed to remove adversarial perturbation with low intrinsic purification errors, achieving superior robustness performance in CIFAR-10/100, CelebA-HQ, and ImageNet datasets under both white-box and grey-box settings.

Abstract

This work presents an information-theoretic examination of diffusion-based purification methods, the state-of-the-art adversarial defenses that utilize diffusion models to remove malicious perturbations in adversarial examples. By theoretically characterizing the inherent purification errors associated with the Markov-based diffusion purifications, we introduce LoRID, a novel Low-Rank Iterative Diffusion purification method designed to remove adversarial perturbation with low intrinsic purification errors. LoRID centers around a multi-stage purification process that leverages multiple rounds of diffusion-denoising loops at the early time-steps of the diffusion models, and the integration of Tucker decomposition, an extension of matrix factorization, to remove adversarial noise at high-noise regimes. Consequently, LoRID increases the effective diffusion time-steps and overcomes strong adversarial attacks, achieving superior robustness performance in CIFAR-10/100, CelebA-HQ, and ImageNet datasets under both white-box and black-box settings.

LoRID: Low-Rank Iterative Diffusion for Adversarial Purification

TL;DR

LoRID is introduced, a novel Low-Rank Iterative Diffusion purification method designed to remove adversarial perturbation with low intrinsic purification errors, achieving superior robustness performance in CIFAR-10/100, CelebA-HQ, and ImageNet datasets under both white-box and grey-box settings.

Abstract

This work presents an information-theoretic examination of diffusion-based purification methods, the state-of-the-art adversarial defenses that utilize diffusion models to remove malicious perturbations in adversarial examples. By theoretically characterizing the inherent purification errors associated with the Markov-based diffusion purifications, we introduce LoRID, a novel Low-Rank Iterative Diffusion purification method designed to remove adversarial perturbation with low intrinsic purification errors. LoRID centers around a multi-stage purification process that leverages multiple rounds of diffusion-denoising loops at the early time-steps of the diffusion models, and the integration of Tucker decomposition, an extension of matrix factorization, to remove adversarial noise at high-noise regimes. Consequently, LoRID increases the effective diffusion time-steps and overcomes strong adversarial attacks, achieving superior robustness performance in CIFAR-10/100, CelebA-HQ, and ImageNet datasets under both white-box and black-box settings.
Paper Structure (32 sections, 11 theorems, 45 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 32 sections, 11 theorems, 45 equations, 4 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Let $\left\{\mathbf{x}^{(i)}_t \right\}_{t \in \{0,...,T \}}, i \in \{1,2\}$ be two diffusion processes given by the forward equation (eq:forward) of a DDPM. Denote $q^{(1)}_t$ and $q^{(2)}_t$ the distributions of $\mathbf{x}_t^{(1)}$ and $\mathbf{x}_t^{(2)}$, respectively. Then, for all $t \in \{0

Figures (4)

  • Figure 1: The MMSEs induced by Markov-based purification against the iterative factor $L$ (Corollary \ref{['corollary:clean']}): each point is the MMSE of the reconstructed data from a normalized Gaussian through $L$ iterative loops of $t/L$ diffusion-denoising calls. Thus, points on a line share the same effective denoising step $t = (t/L) \times L$. The key observation is the purification error generally decreases as $L$ increases. The right samples compare clean samples, purified samples with a single large time-step $t/L=600$, and those with the same effective denoising step $t$ but with a larger iterative factor $L =10$ (Details in Appx. \ref{['appx:mmse']}).
  • Figure 2: The overall purification process of LoRID: given an input image $\mathbf{x}$, LoRID first transforms the image to a tensor and conducts tensor factorization to eliminate some adversarial perturbation. Then, multiple loops of diffusion-denoising, denoted by $f_t$ and $r_t$, at the early stages of the diffusion models are applied to obtain the final purified image $\hat{\mathbf{x}}$.
  • Figure 3: Illustration of adversarial purification using DDPM. The adversarial samples (left) is purified from the time-step $t=200$ (bottom) and $t=500$ (top) to recover the original samples (right). The middles show $\hat{{\mathbf{x}}}_{t}$ (Equation (\ref{['eq:reverse_iterative']})) obtained by iteratively denoising to the indicated intermediate time-steps. The top purification with a too large time-step induces unavoidable error (Theorem \ref{['theorem:ddpm_time']}).
  • Figure 4: Impact of the time-step $t$ and iterative factor $L$ on the standard accuracy of WideResnet-28-10 in CIFAR-100 dataset.

Theorems & Definitions (16)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Corollary 1
  • Theorem 4
  • Theorem 5
  • Theorem
  • proof
  • Theorem
  • proof
  • ...and 6 more