What is Adversarial Training for Diffusion Models?
Briglia Maria Rosaria, Mujtaba Hussain Mirza, Giuseppe Lisanti, Iacopo Masi
TL;DR
This paper addresses the robustness of diffusion models to noisy, outlier, and adversarial data by introducing adversarial training specifically for diffusion models (AT-DM). It shows that, unlike classifier AT which enforces invariance, DM AT must enforce equivariance to keep the diffusion process aligned with the data distribution, achieved by perturbing diffusion trajectories with time-varying noise and incorporating a dedicated regularizer. The authors formalize an AT objective, combine it with the standard denoising loss, and demonstrate through synthetic (low-dimensional) and real-world (CIFAR-10, CelebA, LSUN Bedroom) experiments that Robust$_{\text{adv}}$ yields smoother diffusion flows, reduced memorization, and faster sampling, while improving robustness to severe noise and adversarial attacks. This work broadens the applicability of DMs in noisy or adversarial settings and suggests practical benefits for denoising data distributions and potential adversarial purification tasks in real-world deployments.
Abstract
We answer the question in the title, showing that adversarial training (AT) for diffusion models (DMs) fundamentally differs from classifiers: while AT in classifiers enforces output invariance, AT in DMs requires equivariance to keep the diffusion process aligned with the data distribution. AT is a way to enforce smoothness in the diffusion flow, improving robustness to outliers and corrupted data. Unlike prior art, our method makes no assumptions about the noise model and integrates seamlessly into diffusion training by adding random noise, similar to randomized smoothing, or adversarial noise, akin to AT. This enables intrinsic capabilities such as handling noisy data, dealing with extreme variability such as outliers, preventing memorization, and improving robustness. We rigorously evaluate our approach with proof-of-concept datasets with known distributions in low- and high-dimensional space, thereby taking a perfect measure of errors; we further evaluate on standard benchmarks such as CIFAR-10, CelebA and LSUN Bedroom, showing strong performance under severe noise, data corruption, and iterative adversarial attacks.
