DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks

Caixin Kang; Yinpeng Dong; Zhengyi Wang; Shouwei Ruan; Yubo Chen; Hang Su; Xingxing Wei

DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks

Caixin Kang, Yinpeng Dong, Zhengyi Wang, Shouwei Ruan, Yubo Chen, Hang Su, Xingxing Wei

TL;DR

This work tackles adversarial patch attacks by introducing DIFFender, a diffusion-based defense that leverages a discovered Adversarial Anomaly Perception (AAP) phenomenon. A single pre-trained diffusion model is used to both localize patches (via differences in denoised outputs at a fixed noise level $t^*$) and restore the affected regions, guided by text prompts and vision-language pre-training. The approach employs a lightweight few-shot prompt-tuning strategy with learnable prompts $prompt_{L}$ and $prompt_{R}$ and a joint loss $L_{PT}$ to optimize localization and restoration without extensive retraining. Extensive experiments across ImageNet, face recognition, and physical-world scenarios demonstrate strong robustness under adaptive attacks and good generalization to unseen attacks and classifiers.

Abstract

Adversarial attacks, particularly patch attacks, pose significant threats to the robustness and reliability of deep learning models. Developing reliable defenses against patch attacks is crucial for real-world applications. This paper introduces DIFFender, a novel defense framework that harnesses the capabilities of a text-guided diffusion model to combat patch attacks. Central to our approach is the discovery of the Adversarial Anomaly Perception (AAP) phenomenon, which empowers the diffusion model to detect and localize adversarial patches through the analysis of distributional discrepancies. DIFFender integrates dual tasks of patch localization and restoration within a single diffusion model framework, utilizing their close interaction to enhance defense efficacy. Moreover, DIFFender utilizes vision-language pre-training coupled with an efficient few-shot prompt-tuning algorithm, which streamlines the adaptation of the pre-trained diffusion model to defense tasks, thus eliminating the need for extensive retraining. Our comprehensive evaluation spans image classification and face recognition tasks, extending to real-world scenarios, where DIFFender shows good robustness against adversarial attacks. The versatility and generalizability of DIFFender are evident across a variety of settings, classifiers, and attack methodologies, marking an advancement in adversarial patch defense strategies.

DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks

TL;DR

) and restore the affected regions, guided by text prompts and vision-language pre-training. The approach employs a lightweight few-shot prompt-tuning strategy with learnable prompts

and

and a joint loss

to optimize localization and restoration without extensive retraining. Extensive experiments across ImageNet, face recognition, and physical-world scenarios demonstrate strong robustness under adaptive attacks and good generalization to unseen attacks and classifiers.

Abstract

Paper Structure (13 sections, 6 equations, 8 figures, 8 tables)

This paper contains 13 sections, 6 equations, 8 figures, 8 tables.

Introduction
Related work
Methodology
Discovery of the AAP Phenomenon
DIFFender
Prompt Tuning
Experiments
Experimental settings.
Evaluation on ImageNet
Ablation studies and additional results
Extension in Face Recognition.
Extension in Physical World.
Discussion and Conclusion

Figures (8)

Figure 1: The intriguing phenomenon of the diffusion model. A diffusion model is performed multiple times on the given adversarial image, and the differences between any two denoised images are pronounced within the adversarial patch regions, which can be leveraged to further pinpoint the location of adversarial patches.
Figure 2: Pipeline of DIFFender. DIFFender leverages a unified diffusion model to jointly guide the localization and restoration of adversarial patch attacks, and combines a prompt-tuning module to facilitate efficient tuning.
Figure 3: Denoised results by diffusion model at different noise ratios. With small ratios ( $t^* = 0.15/0.5$ ), the patch cannot be purified; conversely, the global structure becomes lost with large ratios ( $t^* = 0.7/0.9$ ).
Figure 4: In the analysis of ImageNet images, we find a pronounced difference specifically within regions affected by adversarial patches. This observation provides empirical evidence supporting the AAP phenomenon.
Figure 5: To gain the final refined mask, the estimated differences are binarized, applied Gaussian smoothing and dilation operations.
...and 3 more figures

DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks

TL;DR

Abstract

DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (8)