ADBM: Adversarial diffusion bridge model for reliable adversarial purification
Xiao Li, Wenxuan Sun, Huanran Chen, Qiongxiu Li, Yining Liu, Yingzhe He, Jie Shi, Xiaolin Hu
TL;DR
ADBM introduces a diffusion-based adversarial purification method that constructs a direct reverse bridge from the diffused adversarial data to clean data, addressing DiffPure's trade-off between noise removal and data preservation. The approach includes a classifier-guided adversarial-noise generation process, a tailored training objective L_b, and the ability to use fast samplers like DDIM for efficient inference. The authors provide theoretical guarantees and demonstrate through reliable adaptive evaluations that ADBM achieves superior robustness to both seen and unseen threats, including black-box attacks, often outperforming adversarial training and DiffPure baselines. Overall, ADBM offers a practical, plug-and-play defense with competitive robustness and transferability, suitable for safeguarding foundation models without retraining them entirely.
Abstract
Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples. However, we find DiffPure which directly employs the original pre-trained diffusion models for adversarial purification, to be suboptimal. This is due to an inherent trade-off between noise purification performance and data recovery quality. Additionally, the reliability of existing evaluations for DiffPure is questionable, as they rely on weak adaptive attacks. In this work, we propose a novel Adversarial Diffusion Bridge Model, termed ADBM. ADBM directly constructs a reverse bridge from the diffused adversarial data back to its original clean examples, enhancing the purification capabilities of the original diffusion models. Through theoretical analysis and experimental validation across various scenarios, ADBM has proven to be a superior and robust defense mechanism, offering significant promise for practical applications.
