Table of Contents
Fetching ...

Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification

Gaozheng Pei, Shaojie Lyu, Gong Chen, Ke Ma, Qianqian Xu, Yingfei Sun, Qingming Huang

TL;DR

This work addresses the challenge of purifying adversarial perturbations with diffusion models without sacrificing semantic content. It introduces a heterogeneous forward process guided by neural attention, applying stronger noise to regions the model relies on and lighter noise elsewhere, complemented by a two-stage heterogeneous denoising that performs inpainting-like restoration before standard diffusion sampling. To counter strong adaptive attacks, the method replaces multi-step resampling with a single-step, DDIM-like update, substantially reducing time and memory costs. Across CIFAR-10, SVHN, and ImageNet, the approach yields consistent improvements in standard and robust accuracy over prior diffusion-based and training-based defenses, while enabling feasible gradient-based evaluation on commodity GPUs.

Abstract

Existing diffusion-based purification methods aim to disrupt adversarial perturbations by introducing a certain amount of noise through a forward diffusion process, followed by a reverse process to recover clean examples. However, this approach is fundamentally flawed: the uniform operation of the forward process across all pixels compromises normal pixels while attempting to combat adversarial perturbations, resulting in the target model producing incorrect predictions. Simply relying on low-intensity noise is insufficient for effective defense. To address this critical issue, we implement a heterogeneous purification strategy grounded in the interpretability of neural networks. Our method decisively applies higher-intensity noise to specific pixels that the target model focuses on while the remaining pixels are subjected to only low-intensity noise. This requirement motivates us to redesign the sampling process of the diffusion model, allowing for the effective removal of varying noise levels. Furthermore, to evaluate our method against strong adaptative attack, our proposed method sharply reduces time cost and memory usage through a single-step resampling. The empirical evidence from extensive experiments across three datasets demonstrates that our method outperforms most current adversarial training and purification techniques by a substantial margin.

Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification

TL;DR

This work addresses the challenge of purifying adversarial perturbations with diffusion models without sacrificing semantic content. It introduces a heterogeneous forward process guided by neural attention, applying stronger noise to regions the model relies on and lighter noise elsewhere, complemented by a two-stage heterogeneous denoising that performs inpainting-like restoration before standard diffusion sampling. To counter strong adaptive attacks, the method replaces multi-step resampling with a single-step, DDIM-like update, substantially reducing time and memory costs. Across CIFAR-10, SVHN, and ImageNet, the approach yields consistent improvements in standard and robust accuracy over prior diffusion-based and training-based defenses, while enabling feasible gradient-based evaluation on commodity GPUs.

Abstract

Existing diffusion-based purification methods aim to disrupt adversarial perturbations by introducing a certain amount of noise through a forward diffusion process, followed by a reverse process to recover clean examples. However, this approach is fundamentally flawed: the uniform operation of the forward process across all pixels compromises normal pixels while attempting to combat adversarial perturbations, resulting in the target model producing incorrect predictions. Simply relying on low-intensity noise is insufficient for effective defense. To address this critical issue, we implement a heterogeneous purification strategy grounded in the interpretability of neural networks. Our method decisively applies higher-intensity noise to specific pixels that the target model focuses on while the remaining pixels are subjected to only low-intensity noise. This requirement motivates us to redesign the sampling process of the diffusion model, allowing for the effective removal of varying noise levels. Furthermore, to evaluate our method against strong adaptative attack, our proposed method sharply reduces time cost and memory usage through a single-step resampling. The empirical evidence from extensive experiments across three datasets demonstrates that our method outperforms most current adversarial training and purification techniques by a substantial margin.

Paper Structure

This paper contains 17 sections, 11 equations, 9 figures, 8 tables, 3 algorithms.

Figures (9)

  • Figure 1: When the noise added to adversarial samples is minimal (top), the adversarial perturbations remain intact and cannot be removed. On the other hand, if the noise intensity is excessive (bottom), it can distort the semantic information. Our approach employs an attention mask to introduce varying intensities of noise across different areas (middle). This technique effectively strikes a balance between preserving semantic information and mitigating adversarial perturbations.
  • Figure 2: Pipeline of our method. Given an adversarial image, we extract the attention maps of each block during the forward propagation of the classifier, and construct an attention mask $\mathcal{M}$. We then execute the denoising process of the diffusion model we designed. It can be seen that the samples purified by our method can be correctly classified.
  • Figure 3: Our method maintains semantic consistency between the masked areas and the surrounding pixel values when $U=10$. Note that no matter what $U$ is, our method requires only one additional call to the denoising network.
  • Figure 4: 2D purification trajectories of Our method and Diffpure. For DiffPure (left), if $t^*$=0.2, although the purification direction is toward the original class, the adversarial perturbation cannot be completely eliminated; If $t^*$=0.4, the semantic information changes and the purification direction is no longer toward the original class. Our method (right) can eliminate adversarial perturbation and remain the semantic information.
  • Figure 5: Visualization of attenion map, attention mask and images purified by our method.
  • ...and 4 more figures