Table of Contents
Fetching ...

DiffPAD: Denoising Diffusion-based Adversarial Patch Decontamination

Jia Fu, Xiao Zhang, Sepideh Pashami, Fatemeh Rahimian, Anders Holst

TL;DR

DiffPAD is proposed, a novel framework that harnesses the power of diffusion models for adversarial patch decontamination and achieves state-of-the-art adversarial robustness against patch attacks but also excels in recovering naturalistic images without patch remnants.

Abstract

In the ever-evolving adversarial machine learning landscape, developing effective defenses against patch attacks has become a critical challenge, necessitating reliable solutions to safeguard real-world AI systems. Although diffusion models have shown remarkable capacity in image synthesis and have been recently utilized to counter $\ell_p$-norm bounded attacks, their potential in mitigating localized patch attacks remains largely underexplored. In this work, we propose DiffPAD, a novel framework that harnesses the power of diffusion models for adversarial patch decontamination. DiffPAD first performs super-resolution restoration on downsampled input images, then adopts binarization, dynamic thresholding scheme and sliding window for effective localization of adversarial patches. Such a design is inspired by the theoretically derived correlation between patch size and diffusion restoration error that is generalized across diverse patch attack scenarios. Finally, DiffPAD applies inpainting techniques to the original input images with the estimated patch region being masked. By integrating closed-form solutions for super-resolution restoration and image inpainting into the conditional reverse sampling process of a pre-trained diffusion model, DiffPAD obviates the need for text guidance or fine-tuning. Through comprehensive experiments, we demonstrate that DiffPAD not only achieves state-of-the-art adversarial robustness against patch attacks but also excels in recovering naturalistic images without patch remnants. The source code is available at https://github.com/JasonFu1998/DiffPAD.

DiffPAD: Denoising Diffusion-based Adversarial Patch Decontamination

TL;DR

DiffPAD is proposed, a novel framework that harnesses the power of diffusion models for adversarial patch decontamination and achieves state-of-the-art adversarial robustness against patch attacks but also excels in recovering naturalistic images without patch remnants.

Abstract

In the ever-evolving adversarial machine learning landscape, developing effective defenses against patch attacks has become a critical challenge, necessitating reliable solutions to safeguard real-world AI systems. Although diffusion models have shown remarkable capacity in image synthesis and have been recently utilized to counter -norm bounded attacks, their potential in mitigating localized patch attacks remains largely underexplored. In this work, we propose DiffPAD, a novel framework that harnesses the power of diffusion models for adversarial patch decontamination. DiffPAD first performs super-resolution restoration on downsampled input images, then adopts binarization, dynamic thresholding scheme and sliding window for effective localization of adversarial patches. Such a design is inspired by the theoretically derived correlation between patch size and diffusion restoration error that is generalized across diverse patch attack scenarios. Finally, DiffPAD applies inpainting techniques to the original input images with the estimated patch region being masked. By integrating closed-form solutions for super-resolution restoration and image inpainting into the conditional reverse sampling process of a pre-trained diffusion model, DiffPAD obviates the need for text guidance or fine-tuning. Through comprehensive experiments, we demonstrate that DiffPAD not only achieves state-of-the-art adversarial robustness against patch attacks but also excels in recovering naturalistic images without patch remnants. The source code is available at https://github.com/JasonFu1998/DiffPAD.

Paper Structure

This paper contains 11 sections, 1 theorem, 14 equations, 4 figures, 6 tables.

Key Result

Theorem 1

Assume $\|\bm\epsilon_\theta\left(\bm{x}_t\right)\|\leq C_\epsilon\sqrt{1-\bar{\alpha}_t}$ and let $\gamma:=\int_0^{T} \beta_t \mathrm{d} t$. With probability at least $1-\xi$, the $\ell_2$ distance between the diffusion-purified image $\hat{\bm{x}^a}$ with adversarial patch and the corresponding cl where $\varepsilon$ is the $\ell_2$-norm bound of the patch, $C_\xi:=\sqrt{2 d+4 \sqrt{d \log \frac

Figures (4)

  • Figure 1: The overall pipeline of DiffPAD, which follows steps numbered from 1 to 7 in order. Text and blocks in turquoise, pink and yellow correspond to the conditional diffusion restoration module, patch localization module and image degradation operations, respectively. The input of DiffPAD is the adversarial patch contaminated image $\bm{x}^a$ (with red frame), and the output is the decontaminated image $\bm{x}_0^i$ (with green frame).
  • Figure 2: Illustration of the linear relationship between diffusion restoration errors and optimal thresholds for patch localization under various attacks. In particular, we vary the size of the adversarial patches generated by different attacks on various model architectures.
  • Figure 3: Illustration of three exampled visual effects on adversarial patches before and after applying different patch defenses. Note that it is difficult to find any traces of the adversarial patch from the images decontaminated by DiffPAD.
  • Figure 4: The performance of DiffPAD in facial recognition task on VGG Face. We run twice to attain two well-restored samples.

Theorems & Definitions (1)

  • Theorem 1