Table of Contents
Fetching ...

ZeroPur: Succinct Training-Free Adversarial Purification

Erhu Liu, Zonglin Yang, Bo Liu, Bin Xiao, Xiuli Bi

TL;DR

ZeroPur tackles the problem of defending against unseen adversarial attacks without retraining or external purification models. It introduces Guided Shift (GS) and Adaptive Projection (AP), a two-stage, training-free purification pipeline that leverages the natural image manifold hypothesis and uses a blur-derived guidance direction to estimate a purification trajectory $- ilde{e}(m{ u})$ and project along that direction to restore the embedding toward the manifold. GS uses the gradient of the cosine distance between the adversarial embedding and its blurred counterpart, while AP optimizes a constrained objective across selected layers to maximize projection along the GS-derived direction with perceptual regularization via LPIPS. Across CIFAR-10/100 and ImageNet-1K, ZeroPur achieves robust accuracy comparable to or exceeding state-of-the-art purification methods, notably outperforming AT/ABP while avoiding retraining costs, though it remains susceptible to strong adaptive attacks and may benefit from diffusion-based extensions in the future.

Abstract

Adversarial purification is a kind of defense technique that can defend against various unseen adversarial attacks without modifying the victim classifier. Existing methods often depend on external generative models or cooperation between auxiliary functions and victim classifiers. However, retraining generative models, auxiliary functions, or victim classifiers relies on the domain of the fine-tuned dataset and is computation-consuming. In this work, we suppose that adversarial images are outliers of the natural image manifold, and the purification process can be considered as returning them to this manifold. Following this assumption, we present a simple adversarial purification method without further training to purify adversarial images, called ZeroPur. ZeroPur contains two steps: given an adversarial example, Guided Shift obtains the shifted embedding of the adversarial example by the guidance of its blurred counterparts; after that, Adaptive Projection constructs a directional vector by this shifted embedding to provide momentum, projecting adversarial images onto the manifold adaptively. ZeroPur is independent of external models and requires no retraining of victim classifiers or auxiliary functions, relying solely on victim classifiers themselves to achieve purification. Extensive experiments on three datasets (CIFAR-10, CIFAR-100, and ImageNet-1K) using various classifier architectures (ResNet, WideResNet) demonstrate that our method achieves state-of-the-art robust performance. The code will be publicly available.

ZeroPur: Succinct Training-Free Adversarial Purification

TL;DR

ZeroPur tackles the problem of defending against unseen adversarial attacks without retraining or external purification models. It introduces Guided Shift (GS) and Adaptive Projection (AP), a two-stage, training-free purification pipeline that leverages the natural image manifold hypothesis and uses a blur-derived guidance direction to estimate a purification trajectory and project along that direction to restore the embedding toward the manifold. GS uses the gradient of the cosine distance between the adversarial embedding and its blurred counterpart, while AP optimizes a constrained objective across selected layers to maximize projection along the GS-derived direction with perceptual regularization via LPIPS. Across CIFAR-10/100 and ImageNet-1K, ZeroPur achieves robust accuracy comparable to or exceeding state-of-the-art purification methods, notably outperforming AT/ABP while avoiding retraining costs, though it remains susceptible to strong adaptive attacks and may benefit from diffusion-based extensions in the future.

Abstract

Adversarial purification is a kind of defense technique that can defend against various unseen adversarial attacks without modifying the victim classifier. Existing methods often depend on external generative models or cooperation between auxiliary functions and victim classifiers. However, retraining generative models, auxiliary functions, or victim classifiers relies on the domain of the fine-tuned dataset and is computation-consuming. In this work, we suppose that adversarial images are outliers of the natural image manifold, and the purification process can be considered as returning them to this manifold. Following this assumption, we present a simple adversarial purification method without further training to purify adversarial images, called ZeroPur. ZeroPur contains two steps: given an adversarial example, Guided Shift obtains the shifted embedding of the adversarial example by the guidance of its blurred counterparts; after that, Adaptive Projection constructs a directional vector by this shifted embedding to provide momentum, projecting adversarial images onto the manifold adaptively. ZeroPur is independent of external models and requires no retraining of victim classifiers or auxiliary functions, relying solely on victim classifiers themselves to achieve purification. Extensive experiments on three datasets (CIFAR-10, CIFAR-100, and ImageNet-1K) using various classifier architectures (ResNet, WideResNet) demonstrate that our method achieves state-of-the-art robust performance. The code will be publicly available.
Paper Structure (15 sections, 2 theorems, 21 equations, 8 figures, 7 tables, 2 algorithms)

This paper contains 15 sections, 2 theorems, 21 equations, 8 figures, 7 tables, 2 algorithms.

Key Result

Theorem 1

Suppose $f: \mathbb{R}^{n} \rightarrow \mathbb{R}^{d}$ is twice differentiable at point $\bm{x}$. Let $e_{1} = \nabla f_{1}(\bm{x})^{T}\bm{\delta} + \frac{1}{2}\bm{\delta}^{T}\nabla^{2}f_{1}(\bar{\bm{x}})\bm{\delta}$, and there exists $w_{1} \in [0, 1]$ such that $\bar{\bm{x}} = w_{1}\bm{x} + (1 - w where there exists $w_{l} \in [0, 1]$ such that $\bar{f}_{1...l-1}(\bm{x}) = w_{l}f_{1...l-1}(\bm{x

Figures (8)

  • Figure 1: An illustration of ZeroPur.
  • Figure 2: The impact of adversarial samples and their improvement through different operations, visualized using t-SNE. (a) AutoAttack croce2020reliable causes adversarial samples to deviate from the manifold; (b) Gaussian Blur, (c) Guided Shift and (d) ZeroPur (Ours) reverse this deviation to varying degrees. (e) Feature trajectories under adversarial attack and the blur operation, indicating that simple blurring operator can provide a promising direction toward the natural image manifold.
  • Figure 3: Guided Shift
  • Figure 4: Intuitive explanation of AP in the context of adversarial or natural $\bm{x}_{\mathrm{init}}$. PR: the perceptual regularization (Eq.(\ref{['F2']})).
  • Figure 5: SA and RA (%) of classifiers trained with three-level data augmentation on (a) ResNet-18 and (b) WideResNet-28-10. (c) Ablation Study of perceptual regularization on CIFAR-10. (d) Robust accuracy against DI$^{2}$-FGSM and AutoAttack on CIFAR-10 by Blur and TVM.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 1
  • Proof 1