Table of Contents
Fetching ...

Improved Diffusion-based Generative Model with Better Adversarial Robustness

Zekun Wang, Mingyang Yi, Shuchen Xue, Zhenguo Li, Ming Liu, Bing Qin, Zhi-Ming Ma

TL;DR

This work tackles a key bottleneck in diffusion-based generative modeling: the distribution mismatch between training (denoising ground-truth noise) and sampling (noisy steps from prior outputs). By casting the problem in a Distributionally Robust Optimization framework, the authors show that robustness to distributional perturbations is equivalent to adversarial training for diffusion models, and they extend the same reasoning to Consistency Models. They derive a DRO-based objective that leads to an adversarial noise-prediction formulation with provable error bounds, and they translate this into efficient Free-AT implementations for both DPMs and CM. Empirically, adversarial training yields substantial improvements in sample quality (lower FID) and robustness across CIFAR-10, ImageNet 64×64, and MS-COCO 512×512, including latent consistency settings, without sacrificing convergence. The results suggest practical, scalable robustness enhancements for diffusion-based generative systems with broad applicability to image synthesis and text-to-image tasks.

Abstract

Diffusion Probabilistic Models (DPMs) have achieved significant success in generative tasks. However, their training and sampling processes suffer from the issue of distribution mismatch. During the denoising process, the input data distributions differ between the training and inference stages, potentially leading to inaccurate data generation. To obviate this, we analyze the training objective of DPMs and theoretically demonstrate that this mismatch can be alleviated through Distributionally Robust Optimization (DRO), which is equivalent to performing robustness-driven Adversarial Training (AT) on DPMs. Furthermore, for the recently proposed Consistency Model (CM), which distills the inference process of the DPM, we prove that its training objective also encounters the mismatch issue. Fortunately, this issue can be mitigated by AT as well. Based on these insights, we propose to conduct efficient AT on both DPM and CM. Finally, extensive empirical studies validate the effectiveness of AT in diffusion-based models. The code is available at https://github.com/kugwzk/AT_Diff.

Improved Diffusion-based Generative Model with Better Adversarial Robustness

TL;DR

This work tackles a key bottleneck in diffusion-based generative modeling: the distribution mismatch between training (denoising ground-truth noise) and sampling (noisy steps from prior outputs). By casting the problem in a Distributionally Robust Optimization framework, the authors show that robustness to distributional perturbations is equivalent to adversarial training for diffusion models, and they extend the same reasoning to Consistency Models. They derive a DRO-based objective that leads to an adversarial noise-prediction formulation with provable error bounds, and they translate this into efficient Free-AT implementations for both DPMs and CM. Empirically, adversarial training yields substantial improvements in sample quality (lower FID) and robustness across CIFAR-10, ImageNet 64×64, and MS-COCO 512×512, including latent consistency settings, without sacrificing convergence. The results suggest practical, scalable robustness enhancements for diffusion-based generative systems with broad applicability to image synthesis and text-to-image tasks.

Abstract

Diffusion Probabilistic Models (DPMs) have achieved significant success in generative tasks. However, their training and sampling processes suffer from the issue of distribution mismatch. During the denoising process, the input data distributions differ between the training and inference stages, potentially leading to inaccurate data generation. To obviate this, we analyze the training objective of DPMs and theoretically demonstrate that this mismatch can be alleviated through Distributionally Robust Optimization (DRO), which is equivalent to performing robustness-driven Adversarial Training (AT) on DPMs. Furthermore, for the recently proposed Consistency Model (CM), which distills the inference process of the DPM, we prove that its training objective also encounters the mismatch issue. Fortunately, this issue can be mitigated by AT as well. Based on these insights, we propose to conduct efficient AT on both DPM and CM. Finally, extensive empirical studies validate the effectiveness of AT in diffusion-based models. The code is available at https://github.com/kugwzk/AT_Diff.

Paper Structure

This paper contains 46 sections, 20 theorems, 66 equations, 7 figures, 17 tables, 2 algorithms.

Key Result

Proposition 1

The minimization problem eq:rewrite nll upper bound is equivalent to minimizing an upper bound of $\mathbb{E}_{q}[-\log{p_{\boldsymbol{\theta}}}(\boldsymbol{x}_{t})]$ for any $0\leq t \leq T$.

Figures (7)

  • Figure 1: A comparison between standard training and the proposed distributional robust optimization in \ref{['eq:dro objective']}. When minimizing $D_{KL}(\tilde{q}_{t}(\boldsymbol{x}_{t}\mid \boldsymbol{x}_{t + 1})\parallel p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t}\mid \boldsymbol{x}_{t + 1}))$, the $\boldsymbol{x}_{t + 1}$ is sampled from $\tilde{q}_{t}(\boldsymbol{x}_{t + 1})$, such that both $\tilde{q}_{t}(\boldsymbol{x}_{t + 1})$ in training stage and $p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t + 1})$ in inference stage are in $B_{D_{KL}}(q(x_{t + 1}), \eta_{0})$, so that $p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t})$ tends to locates in $B_{D_{KL}}(q(x_{t}), \eta_{0})$ as well as $\tilde{q}_{t}(\boldsymbol{x}_{t})$. Then, the distributional robustness captured by \ref{['eq:dro objective']} guarantees the generated $p_{\boldsymbol{\theta}}(\boldsymbol{x}_{t})$ always locates around $q(\boldsymbol{x}_{t})$ for all $t$.
  • Figure 2: The convergence of methods trained from scratch on CIFAR-10$32\times32$. We use the DDIM sampler with 50 NFEs for sampling.
  • Figure 3: The convergence of methods fine-tuned from a same pretrained model on CIFAR-10$32\times32$. We compare the performance of methods on various samplers.
  • Figure 4: The qualitative comparsions of ADM-AT (top, FID 6.60), ADM-IP (middle, FID 7.81), and ADM (bottom, FID 10.58) on CIFAR10$32\times32$. We use the IDDPM sampler with 20 NFEs for sampling.
  • Figure 5: The qualitative comparsions of ADM-AT (top, FID 6.20), ADM-IP (middle, FID 8.40) and ADM (bottom, FID 8.32) on ImageNet$64\times64$. We use the DDIM sampler with 20 NFEs for sampling.
  • ...and 2 more figures

Theorems & Definitions (32)

  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Theorem 1
  • Proposition 6
  • Remark 1
  • Theorem 2
  • Theorem 3
  • ...and 22 more