Table of Contents
Fetching ...

ADBM: Adversarial diffusion bridge model for reliable adversarial purification

Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, Xiaolin Hu

TL;DR

ADBM introduces a diffusion-based adversarial purification method that constructs a direct reverse bridge from the diffused adversarial data to clean data, addressing DiffPure's trade-off between noise removal and data preservation. The approach includes a classifier-guided adversarial-noise generation process, a tailored training objective L_b, and the ability to use fast samplers like DDIM for efficient inference. The authors provide theoretical guarantees and demonstrate through reliable adaptive evaluations that ADBM achieves superior robustness to both seen and unseen threats, including black-box attacks, often outperforming adversarial training and DiffPure baselines. Overall, ADBM offers a practical, plug-and-play defense with competitive robustness and transferability, suitable for safeguarding foundation models without retraining them entirely.

Abstract

Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples. However, we find DiffPure which directly employs the original pre-trained diffusion models for adversarial purification, to be suboptimal. This is due to an inherent trade-off between noise purification performance and data recovery quality. Additionally, the reliability of existing evaluations for DiffPure is questionable, as they rely on weak adaptive attacks. In this work, we propose a novel Adversarial Diffusion Bridge Model, termed ADBM. ADBM directly constructs a reverse bridge from the diffused adversarial data back to its original clean examples, enhancing the purification capabilities of the original diffusion models. Through theoretical analysis and experimental validation across various scenarios, ADBM has proven to be a superior and robust defense mechanism, offering significant promise for practical applications.

ADBM: Adversarial diffusion bridge model for reliable adversarial purification

TL;DR

ADBM introduces a diffusion-based adversarial purification method that constructs a direct reverse bridge from the diffused adversarial data to clean data, addressing DiffPure's trade-off between noise removal and data preservation. The approach includes a classifier-guided adversarial-noise generation process, a tailored training objective L_b, and the ability to use fast samplers like DDIM for efficient inference. The authors provide theoretical guarantees and demonstrate through reliable adaptive evaluations that ADBM achieves superior robustness to both seen and unseen threats, including black-box attacks, often outperforming adversarial training and DiffPure baselines. Overall, ADBM offers a practical, plug-and-play defense with competitive robustness and transferability, suitable for safeguarding foundation models without retraining them entirely.

Abstract

Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples. However, we find DiffPure which directly employs the original pre-trained diffusion models for adversarial purification, to be suboptimal. This is due to an inherent trade-off between noise purification performance and data recovery quality. Additionally, the reliability of existing evaluations for DiffPure is questionable, as they rely on weak adaptive attacks. In this work, we propose a novel Adversarial Diffusion Bridge Model, termed ADBM. ADBM directly constructs a reverse bridge from the diffused adversarial data back to its original clean examples, enhancing the purification capabilities of the original diffusion models. Through theoretical analysis and experimental validation across various scenarios, ADBM has proven to be a superior and robust defense mechanism, offering significant promise for practical applications.
Paper Structure (36 sections, 4 theorems, 44 equations, 2 figures, 16 tables)

This paper contains 36 sections, 4 theorems, 44 equations, 2 figures, 16 tables.

Key Result

Theorem 1

Given an adversarial example ${\mathbf{x}}_0^a$ and assuming the training loss $L_b \leq \delta$, the distance between the purified example of ADBM and the clean example ${\mathbf{x}}_0$, denoted as $\|\hat{{\mathbf{x}}}_0-{\mathbf{x}}_0\|$, is bounded by $\delta$ (constant omitted) in expectation w

Figures (2)

  • Figure 1: The inference pipeline of AP (a) and the comparison between DiffPure (b) and ADBM (c). DiffPure relies on the diffused adversarial data distribution (Diffused Adv. Dist.) being close enough to the diffused clean data distribution. ADBM directly builds a reverse process from the diffused adversarial data distribution to clean data distribution.
  • Figure 2: The illustration of ADBM training. The blue block represents the training objective, and the green block represents the extra module for adversarial noise generation. Black arrows denote the computation of $L_b$, blue arrows denote the computation of $L_c$ used for generating adversarial noise, and red arrows denote the direction of the gradient flow when calculating ${\boldsymbol{\epsilon}}_a$.

Theorems & Definitions (9)

  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof