Table of Contents
Fetching ...

Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification

Yuhao Sun, Jiacheng Zhang, Zesheng Ye, Chaowei Xiao, Feng Liu

TL;DR

A new framework, called Sample-specific Score-aware Noise Injection (SSNI), which uses a pre-trained score network to estimate how much a data point deviates from the clean data distribution, and applies a reweighting function to adaptively adjust $t^*$ for each sample, achieving sample-specific noise injections.

Abstract

Diffusion-based purification (DBP) methods aim to remove adversarial noise from the input sample by first injecting Gaussian noise through a forward diffusion process, and then recovering the clean example through a reverse generative process. In the above process, how much Gaussian noise is injected to the input sample is key to the success of DBP methods, which is controlled by a constant noise level $t^*$ for all samples in existing methods. In this paper, we discover that an optimal $t^*$ for each sample indeed could be different. Intuitively, the cleaner a sample is, the less the noise it should be injected, and vice versa. Motivated by this finding, we propose a new framework, called Sample-specific Score-aware Noise Injection (SSNI). Specifically, SSNI uses a pre-trained score network to estimate how much a data point deviates from the clean data distribution (i.e., score norms). Then, based on the magnitude of score norms, SSNI applies a reweighting function to adaptively adjust $t^*$ for each sample, achieving sample-specific noise injections. Empirically, incorporating our framework with existing DBP methods results in a notable improvement in both accuracy and robustness on CIFAR-10 and ImageNet-1K, highlighting the necessity to allocate distinct noise levels to different samples in DBP methods. Our code is available at: https://github.com/tmlr-group/SSNI.

Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification

TL;DR

A new framework, called Sample-specific Score-aware Noise Injection (SSNI), which uses a pre-trained score network to estimate how much a data point deviates from the clean data distribution, and applies a reweighting function to adaptively adjust for each sample, achieving sample-specific noise injections.

Abstract

Diffusion-based purification (DBP) methods aim to remove adversarial noise from the input sample by first injecting Gaussian noise through a forward diffusion process, and then recovering the clean example through a reverse generative process. In the above process, how much Gaussian noise is injected to the input sample is key to the success of DBP methods, which is controlled by a constant noise level for all samples in existing methods. In this paper, we discover that an optimal for each sample indeed could be different. Intuitively, the cleaner a sample is, the less the noise it should be injected, and vice versa. Motivated by this finding, we propose a new framework, called Sample-specific Score-aware Noise Injection (SSNI). Specifically, SSNI uses a pre-trained score network to estimate how much a data point deviates from the clean data distribution (i.e., score norms). Then, based on the magnitude of score norms, SSNI applies a reweighting function to adaptively adjust for each sample, achieving sample-specific noise injections. Empirically, incorporating our framework with existing DBP methods results in a notable improvement in both accuracy and robustness on CIFAR-10 and ImageNet-1K, highlighting the necessity to allocate distinct noise levels to different samples in DBP methods. Our code is available at: https://github.com/tmlr-group/SSNI.

Paper Structure

This paper contains 31 sections, 4 theorems, 50 equations, 7 figures, 12 tables, 3 algorithms.

Key Result

Lemma 11.2

Let $p_t ({\mathbf{x}})$ denote the marginal probability density of ${\mathbf{x}}$ at time $t$. For any ${\mathbf{x}} \in {\mathcal{X}}$ and $0 \leq t \leq T$, the score function $\nabla_{{\mathbf{x}}} \log p_t ({\mathbf{x}})$ at time $t$ can be expressed as: where $p({\mathbf{x}}_0 | {\mathbf{x}})$ is the posterior given by Bayes' Rule:

Figures (7)

  • Figure 1: For each sub-figure: the 1st column contains the input (i.e., could either be AEs or CEs), the 2nd column contains noise-injected examples with different $t^*$s, and the 3rd column contains purified examples. We use DiffPurenie2022diffusion with a sample-shared $t^* = 100$ selected by nie2022diffusion to conduct this experiment on CIFAR-10 cifar. The globally shared $t^* = 100$ offers a baseline, but results in suboptimal prediction performance compared to what could be achieved by tuning the noise level for individual samples. Notably, while the recovered images obtained by different noise levels may be visually indistinguishable, they carry different semantics. For instance, the image is classified as "frog" (incorrect) with $t^*=100$ but as "dog" (correct) with $t^*=60$ (Figure \ref{['fig: motivation-2']}). These highlight the need for a sample-wise noise level adjustment.
  • Figure 2: Relationship between score norms and perturbation budgets. We use one batch of clean data from CIFAR-10 and employ PGD+EOT $\ell_\infty (\epsilon = 8/255)$ as the attack.
  • Figure 3: An overview of the proposed SSNI framework. SSNI introduces a novel sample-specific mechanism to adaptively adjust the noise injection level for each sample, enhancing purification effectiveness in DBP methods. The process begins by forwarding each sample image $x_i$ through the forward diffusion process using an off-the-shelf diffusion model. To determine how much noise to inject into each sample, SSNI employs a pre-trained score network $s_\theta$ to compute the score norm $\|s_\theta(x_i)\|$, which reflects the distance of the sample from the clean data distribution. Based on this score norm, a reweighting function $f$ adaptively determines the optimal noise level $t^*_i$ for each sample. Finally, each sample is purified through a reverse diffusion process before being classified. Notably, SSNI is designed as a general framework rather than a specific method, which can be seamlessly integrated with a wide range of existing DBP methods.
  • Figure 4: Standard (top) and robust ( bottom) accuracy (%) vs. $\tau$; We report mean and the standard deviations over three runs.
  • Figure :
  • ...and 2 more figures

Theorems & Definitions (13)

  • proof
  • proof
  • proof
  • Definition 11.1: Marginal Probability Density
  • Lemma 11.2
  • proof
  • Lemma 11.3
  • proof
  • Lemma 11.4
  • proof
  • ...and 3 more