ADBM: Adversarial diffusion bridge model for reliable adversarial purification

Xiao Li; Wenxuan Sun; Huanran Chen; Qiongxiu Li; Yining Liu; Yingzhe He; Jie Shi; Xiaolin Hu

ADBM: Adversarial diffusion bridge model for reliable adversarial purification

Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, Xiaolin Hu

TL;DR

ADBM introduces a diffusion-based adversarial purification method that constructs a direct reverse bridge from the diffused adversarial data to clean data, addressing DiffPure's trade-off between noise removal and data preservation. The approach includes a classifier-guided adversarial-noise generation process, a tailored training objective L_b, and the ability to use fast samplers like DDIM for efficient inference. The authors provide theoretical guarantees and demonstrate through reliable adaptive evaluations that ADBM achieves superior robustness to both seen and unseen threats, including black-box attacks, often outperforming adversarial training and DiffPure baselines. Overall, ADBM offers a practical, plug-and-play defense with competitive robustness and transferability, suitable for safeguarding foundation models without retraining them entirely.

Abstract

Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples. However, we find DiffPure which directly employs the original pre-trained diffusion models for adversarial purification, to be suboptimal. This is due to an inherent trade-off between noise purification performance and data recovery quality. Additionally, the reliability of existing evaluations for DiffPure is questionable, as they rely on weak adaptive attacks. In this work, we propose a novel Adversarial Diffusion Bridge Model, termed ADBM. ADBM directly constructs a reverse bridge from the diffused adversarial data back to its original clean examples, enhancing the purification capabilities of the original diffusion models. Through theoretical analysis and experimental validation across various scenarios, ADBM has proven to be a superior and robust defense mechanism, offering significant promise for practical applications.

ADBM: Adversarial diffusion bridge model for reliable adversarial purification

TL;DR

Abstract

Paper Structure (36 sections, 4 theorems, 44 equations, 2 figures, 16 tables)

This paper contains 36 sections, 4 theorems, 44 equations, 2 figures, 16 tables.

Introduction
Preliminary and Related Work
Reliable Evaluation for DiffPure
Adversarial Diffusion Bridge Model
Training Objective
Adversarial Noise Generation
AP Inference of ADBM
Theoretical Analysis
Experiments
Experimental Settings
Robustness against White-Box Adaptive Attacks
Robustness against Black-Box Attacks
Ablation Study
Conclusion and Discussion
Discussions on Diffusion Bridges and Diffusion Classifiers
...and 21 more sections

Key Result

Theorem 1

Given an adversarial example ${\mathbf{x}}_0^a$ and assuming the training loss $L_b \leq \delta$, the distance between the purified example of ADBM and the clean example ${\mathbf{x}}_0$, denoted as $\|\hat{{\mathbf{x}}}_0-{\mathbf{x}}_0\|$, is bounded by $\delta$ (constant omitted) in expectation w

Figures (2)

Figure 1: The inference pipeline of AP (a) and the comparison between DiffPure (b) and ADBM (c). DiffPure relies on the diffused adversarial data distribution (Diffused Adv. Dist.) being close enough to the diffused clean data distribution. ADBM directly builds a reverse process from the diffused adversarial data distribution to clean data distribution.
Figure 2: The illustration of ADBM training. The blue block represents the training objective, and the green block represents the extra module for adversarial noise generation. Black arrows denote the computation of $L_b$, blue arrows denote the computation of $L_c$ used for generating adversarial noise, and red arrows denote the direction of the gradient flow when calculating ${\boldsymbol{\epsilon}}_a$.

Theorems & Definitions (9)

Theorem 1
proof
Theorem 2
proof
proof
Theorem 1
proof
Theorem 2
proof

ADBM: Adversarial diffusion bridge model for reliable adversarial purification

TL;DR

Abstract

ADBM: Adversarial diffusion bridge model for reliable adversarial purification

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)