Real-world Adversarial Defense against Patch Attacks based on Diffusion Model

Xingxing Wei; Caixin Kang; Yinpeng Dong; Zhengyi Wang; Shouwei Ruan; Yubo Chen; Hang Su

Real-world Adversarial Defense against Patch Attacks based on Diffusion Model

Xingxing Wei, Caixin Kang, Yinpeng Dong, Zhengyi Wang, Shouwei Ruan, Yubo Chen, Hang Su

TL;DR

This work addresses adversarial patch robustness by introducing DIFFender, a diffusion-model–based defense that localizes and restores patches through Adversarial Anomaly Perception (AAP). By treating localization and restoration as a unified diffusion process guided by text prompts, and enabling rapid adaptation via few-shot prompt tuning, DIFFender achieves strong robustness against adaptive patch attacks in the visible domain. The method extends to infrared imagery with an Infrared Domain Constrained (IDC) Token and infrared-specific losses, enabling cross-modal defense in a unified framework. Empirical results on ImageNet, LFW, and LLVIP demonstrate substantial improvements in robust accuracy and practical viability, including real-world and physical-world scenarios.

Abstract

Adversarial patches present significant challenges to the robustness of deep learning models, making the development of effective defenses become critical for real-world applications. This paper introduces DIFFender, a novel DIFfusion-based DeFender framework that leverages the power of a text-guided diffusion model to counter adversarial patch attacks. At the core of our approach is the discovery of the Adversarial Anomaly Perception (AAP) phenomenon, which enables the diffusion model to accurately detect and locate adversarial patches by analyzing distributional anomalies. DIFFender seamlessly integrates the tasks of patch localization and restoration within a unified diffusion model framework, enhancing defense efficacy through their close interaction. Additionally, DIFFender employs an efficient few-shot prompt-tuning algorithm, facilitating the adaptation of the pre-trained diffusion model to defense tasks without the need for extensive retraining. Our comprehensive evaluation, covering image classification and face recognition tasks, as well as real-world scenarios, demonstrates DIFFender's robust performance against adversarial attacks. The framework's versatility and generalizability across various settings, classifiers, and attack methodologies mark a significant advancement in adversarial patch defense strategies. Except for the popular visible domain, we have identified another advantage of DIFFender: its capability to easily expand into the infrared domain. Consequently, we demonstrate the good flexibility of DIFFender, which can defend against both infrared and visible adversarial patch attacks alternatively using a universal defense framework.

Real-world Adversarial Defense against Patch Attacks based on Diffusion Model

TL;DR

Abstract

Paper Structure (25 sections, 11 equations, 12 figures, 12 tables)

This paper contains 25 sections, 11 equations, 12 figures, 12 tables.

Introduction
Related Works
Adversarial Attacks
Adversarial Defenses
Infrared Adversarial Attacks and Defenses
Methodology
Discovery of the AAP Phenomenon
DIFFender
Prompt Tuning
Extension to the Infrared Domain
Infrared Domain Constrained Token
Loss Functions for Infrared Domain
Prompt Tuning for Infrared Domain
Experiments in the Visible Domain
Experimental Settings
...and 10 more sections

Figures (12)

Figure 1: The intriguing phenomenon of the diffusion model. When applied multiple times to an adversarial image, the differences between any two resulting denoised images are particularly pronounced within the regions containing adversarial patches. This characteristic can be exploited to more accurately identify the location of these patches.
Figure 2: Pipeline of DIFFender. DIFFender utilizes a unified diffusion model to seamlessly coordinate the localization and restoration of adversarial patch attacks, integrating a prompt-tuning module to enable efficient and precise tuning.
Figure 3: Denoised results at different noise ratios. With smaller ratios ($t^* = 0.15/0.5$), the patch remains unpurified; however, with larger ratios ($t^* = 0.7/0.9$), the global structure is compromised.
Figure 4: In our analysis of ImageNet images, we observe a pronounced difference specifically within regions impacted by adversarial patches, offering empirical evidence in support of the AAP phenomenon.
Figure 5: To refine the mask, the estimated differences are binarized, followed by Gaussian smoothing and dilation operations.
...and 7 more figures

Real-world Adversarial Defense against Patch Attacks based on Diffusion Model

TL;DR

Abstract

Real-world Adversarial Defense against Patch Attacks based on Diffusion Model

Authors

TL;DR

Abstract

Table of Contents

Figures (12)