Table of Contents
Fetching ...

Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds

Shuhai Zhang, Jiahao Yang, Hui Luo, Jie Chen, Li Wang, Feng Liu, Bo Han, Mingkui Tan

TL;DR

CMAP tackles adversarial robustness by purifying inputs through latent-space optimization in a pre-trained consistency model, pulling samples back onto the clean data manifold. It introduces a perceptual consistency restoration loss $L_a$ and a latent-distribution constraint $L_d$, plus a latent-vector voting scheme to stabilize predictions across multiple latent samples. A tailored consistency-disruption attack tests the defense, and experiments on CIFAR-10 and ImageNet-100 show that CMAP achieves strong robust accuracy with minimal loss in standard accuracy, outperforming diffusion-based purification methods. By shifting purification to the latent space and enforcing distributional alignment, CMAP offers an attack-agnostic purification framework with deterministic, single-step generation characteristics.

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial samples crafted by adding imperceptible perturbations to clean data, potentially leading to incorrect and dangerous predictions. Adversarial purification has been an effective means to improve DNNs robustness by removing these perturbations before feeding the data into the model. However, it faces significant challenges in preserving key structural and semantic information of data, as the imperceptible nature of adversarial perturbations makes it hard to avoid over-correcting, which can destroy important information and degrade model performance. In this paper, we break away from traditional adversarial purification methods by focusing on the clean data manifold. To this end, we reveal that samples generated by a well-trained generative model are close to clean ones but far from adversarial ones. Leveraging this insight, we propose Consistency Model-based Adversarial Purification (CMAP), which optimizes vectors within the latent space of a pre-trained consistency model to generate samples for restoring clean data. Specifically, 1) we propose a \textit{Perceptual consistency restoration} mechanism by minimizing the discrepancy between generated samples and input samples in both pixel and perceptual spaces. 2) To maintain the optimized latent vectors within the valid data manifold, we introduce a \textit{Latent distribution consistency constraint} strategy to align generated samples with the clean data distribution. 3) We also apply a \textit{Latent vector consistency prediction} scheme via an ensemble approach to enhance prediction reliability. CMAP fundamentally addresses adversarial perturbations at their source, providing a robust purification. Extensive experiments on CIFAR-10 and ImageNet-100 show that our CMAP significantly enhances robustness against strong adversarial attacks while preserving high natural accuracy.

Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds

TL;DR

CMAP tackles adversarial robustness by purifying inputs through latent-space optimization in a pre-trained consistency model, pulling samples back onto the clean data manifold. It introduces a perceptual consistency restoration loss and a latent-distribution constraint , plus a latent-vector voting scheme to stabilize predictions across multiple latent samples. A tailored consistency-disruption attack tests the defense, and experiments on CIFAR-10 and ImageNet-100 show that CMAP achieves strong robust accuracy with minimal loss in standard accuracy, outperforming diffusion-based purification methods. By shifting purification to the latent space and enforcing distributional alignment, CMAP offers an attack-agnostic purification framework with deterministic, single-step generation characteristics.

Abstract

Deep neural networks (DNNs) are vulnerable to adversarial samples crafted by adding imperceptible perturbations to clean data, potentially leading to incorrect and dangerous predictions. Adversarial purification has been an effective means to improve DNNs robustness by removing these perturbations before feeding the data into the model. However, it faces significant challenges in preserving key structural and semantic information of data, as the imperceptible nature of adversarial perturbations makes it hard to avoid over-correcting, which can destroy important information and degrade model performance. In this paper, we break away from traditional adversarial purification methods by focusing on the clean data manifold. To this end, we reveal that samples generated by a well-trained generative model are close to clean ones but far from adversarial ones. Leveraging this insight, we propose Consistency Model-based Adversarial Purification (CMAP), which optimizes vectors within the latent space of a pre-trained consistency model to generate samples for restoring clean data. Specifically, 1) we propose a \textit{Perceptual consistency restoration} mechanism by minimizing the discrepancy between generated samples and input samples in both pixel and perceptual spaces. 2) To maintain the optimized latent vectors within the valid data manifold, we introduce a \textit{Latent distribution consistency constraint} strategy to align generated samples with the clean data distribution. 3) We also apply a \textit{Latent vector consistency prediction} scheme via an ensemble approach to enhance prediction reliability. CMAP fundamentally addresses adversarial perturbations at their source, providing a robust purification. Extensive experiments on CIFAR-10 and ImageNet-100 show that our CMAP significantly enhances robustness against strong adversarial attacks while preserving high natural accuracy.

Paper Structure

This paper contains 27 sections, 2 theorems, 26 equations, 8 figures, 11 tables, 2 algorithms.

Key Result

Theorem 1

Assuming that the distribution of natural data $p({\bf x}){=} {\mathcal{N}}({\boldsymbol{\mu}}_{{\bf x}}, \sigma_{{\bf x}}^2\mathbf{I})$, where $\mathbf{I}$ is an identity matrix, given a PF ODE sampling $\mathrm{d}{\bf x}=-t \nabla_{\mathbf{x}}\log p_{t}(\mathbf{x})$ with $\mathbf{f}({\bf x},t)=\bf where $\boldsymbol{\mu}_\epsilon=\left(\mathbb{E}_t \frac{ t}{\sigma_{{\bf x}}^2+ t^2}-1\right)\bo

Figures (8)

  • Figure 1: Histograms of MMD distances gretton2012kernel between the features of clean (Cln) and clean samples v.s. generated (Gen) and clean samples v.s. adversarial (Adv) samples and clean samples on CIFAR-10 and ImageNet. The results demonstrate that the generated samples are close to the clean ones but are far from the adversarial ones.
  • Figure 2: Overview of the proposed CMAP. (a) Given a pre-trained consistency model, we optimize a set of latent vectors $\{\tilde{\mathbf{z}}_i\}$ within its latent space $\mathcal{Z}$ to generate samples $\{\tilde{\mathbf{x}}_i\}$ as close to the original test sample $\hat{\mathbf{x}}$ while removing potential adversarial perturbations by perceptual consistency restoration mechanism and latent distribution consistency constrain strategy, illustrated here for $\hat{\mathbf{x}}$ as an adversarial sample. The perceptual consistency restoration mechanism employs $\mathcal{L}_a$ consisting of MAE and SSIM loss to align generated samples with the clean data manifold, while the latent distribution consistency constrain strategy uses a Gaussian distribution constrain loss $\mathcal{L}_{d}$ to ensure that the optimized vectors $\{\tilde{\mathbf{z}}_i\}$ stay within the valid manifold. (b) After optimization, we employ a latent vector consistency prediction scheme by a label voting across the final generated images to determine the final prediction for the test sample $\hat{\mathbf{x}}$.
  • Figure 3: Standard and robust accuracy curves under different $\beta$ against PGD+EOT attack with $\ell_{\infty}$-norm ($\epsilon=8/255$) and $\ell_{2}$-norm ($\epsilon=0.5$) on CIFAR-10, where we use WideResNet-28-10 as the classifier. The results indicate that the absence of constraint ($\beta=0$) leads to a significant drop in robust accuracy, while an excessive constraint ($\beta=5 \times 10^{-3}$) negatively impacts both standard and robust accuracy. In contrast, $\beta=5 \times 10^{-4}$ achieves a favorable balance between standard and robust accuracy, thereby simultaneously suppressing adversarial perturbations while maintaining effective image restoration.
  • Figure 4: Robust accuracy across iterations against consistency-disruption attack with $\ell_{\infty}$-norm and $\ell_{2}$-norm on WideResNet-28-10 for CIFAR-10 and ResNet50 for ImageNet-100. The alignment iterations $T_\mathrm{def}=200$ for CIFAR-10 and $T_\mathrm{def}=300$ for ImageNet-100, with subsequent attack iterations $T_\mathrm{adv}$ extending to $1000$. CMAP maintains high performance throughout the process.
  • Figure 5: Mean and standard deviation of optimized latent vectors by consistency-disruption attack ($\epsilon=8/255$) across optimization iterations on CIFAR-10, where $\mathbf{\mu}=0$ and $\sigma=80$ are the mean and variance of the latent distribution.
  • ...and 3 more figures

Theorems & Definitions (9)

  • Definition 1
  • Remark 1
  • Theorem 1
  • proof
  • Proposition 1
  • proof
  • proof
  • Remark 2
  • proof