Instant Adversarial Purification with Adversarial Consistency Distillation

Chun Tong Lei; Hon Ming Yam; Zhongliang Guo; Yifei Qian; Chun Pong Lau

Instant Adversarial Purification with Adversarial Consistency Distillation

Chun Tong Lei, Hon Ming Yam, Zhongliang Guo, Yifei Qian, Chun Pong Lau

TL;DR

This work tackles the computational bottleneck of diffusion-based adversarial purification by introducing One Step Control Purification (OSCP), a framework that achieves robust purification in a single neural function evaluation. OSCP combines Gaussian Adversarial Noise Distillation (GAND) to learn a denoise trajectory that addresses both Gaussian and adversarial noise, with a CAP inference pipeline that uses non-learnable edge guidance to preserve semantic content during large purification steps. The approach delivers state-of-the-art robustness on ImageNet (robust accuracy ~74.19%) while offering real-time purification (about 0.1s per image) and demonstrates strong cross-architecture transfer and image-quality preservation relative to prior diffusion-based purifiers. These results suggest that diffusion-based defenses can be made practical for time-critical applications, enabling robust perception in environments requiring rapid defense against adversarial threats.

Abstract

Neural networks have revolutionized numerous fields with their exceptional performance, yet they remain susceptible to adversarial attacks through subtle perturbations. While diffusion-based purification methods like DiffPure offer promising defense mechanisms, their computational overhead presents a significant practical limitation. In this paper, we introduce One Step Control Purification (OSCP), a novel defense framework that achieves robust adversarial purification in a single Neural Function Evaluation (NFE) within diffusion models. We propose Gaussian Adversarial Noise Distillation (GAND) as the distillation objective and Controlled Adversarial Purification (CAP) as the inference pipeline, which makes OSCP demonstrate remarkable efficiency while maintaining defense efficacy. Our proposed GAND addresses a fundamental tension between consistency distillation and adversarial perturbation, bridging the gap between natural and adversarial manifolds in the latent space, while remaining computationally efficient through Parameter-Efficient Fine-Tuning (PEFT) methods such as LoRA, eliminating the high computational budget request from full parameter fine-tuning. The CAP guides the purification process through the unlearnable edge detection operator calculated by the input image as an extra prompt, effectively preventing the purified images from deviating from their original appearance when large purification steps are used. Our experimental results on ImageNet showcase OSCP's superior performance, achieving a 74.19% defense success rate with merely 0.1s per purification -- a 100-fold speedup compared to conventional approaches.

Instant Adversarial Purification with Adversarial Consistency Distillation

TL;DR

Abstract

Paper Structure (28 sections, 23 equations, 10 figures, 11 tables, 1 algorithm)

This paper contains 28 sections, 23 equations, 10 figures, 11 tables, 1 algorithm.

Introduction
Related Work
Adversarial training
Adversarial Purification
Diffusion Models
Efficient Diffusion Models
Preliminaries
Diffusion model
Diffusion-Base Purification (DBP)
Consistency Model
Method
One Step Control Purification
Problem Definition
Gaussian Adversarial Noise Distillation
Controlled Adversarial Purification
...and 13 more sections

Figures (10)

Figure 1: Comparison between existing methods and our proposed approach. Our method achieves superior performance with just a single inference step, significantly reducing computational cost. Through adversarial noise-adapted U-Net fine-tuning, we demonstrate better detail preservation after denoising, as evident in the zoomed-in regions for circles (bottom). This makes OSCP an efficient and practical solution for adversarial purification.
Figure 2: The pipeline of Our proposed OSCP. (a) the left figure shows that the adversarial images, which are crafted through intentional attacks, exhibit a shifted distribution after the diffusion process that deviates from the Standard Normal distribution. In response, our proposed GAND can learn to recover the attacked images by modeling this additional adversarial noise with LoRA. (b) the right figure illustrates the pipeline that our proposed CAP leverages non-learnable edge detection operators to guide the purification of adversarial samples, avoiding potential inductive bias introduced by neural networks. It is worth noting that our method achieves remarkable performance by just running a single U-Net inference step.
Figure 3: CAP used edge image of the adversarial image to control our purification process, maximizing the remaining semantic information of the purified image.
Figure 4: Visualization of the IQA experiment which compares with DiffPure and the proposed method. (a) Input image. (b) Adversarial image. (c) DiffPure. (d) Ours.
Figure 5: Performance of our method on different $t^\ast$ under PGD-100 $L_\infty \gamma$$(\gamma = 4/255)$, $\eta=$ 0.01*4/255, where we evaluate on ResNet50 on ImageNet.
...and 5 more figures

Instant Adversarial Purification with Adversarial Consistency Distillation

TL;DR

Abstract

Instant Adversarial Purification with Adversarial Consistency Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)