Table of Contents
Fetching ...

Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling

Christian Belardi, Justin Lovelace, Kilian Q. Weinberger, Carla P. Gomes

Abstract

Guided diffusion sampling relies on approximating often intractable likelihood scores, which introduces significant noise into the sampling dynamics. We propose using adaptive moment estimation to stabilize these noisy likelihood scores during sampling. Despite its simplicity, our approach achieves state-of-the-art results on image restoration and class-conditional generation tasks, outperforming more complicated methods, which are often computationally more expensive. We provide empirical analysis of our method on both synthetic and real data, demonstrating that mitigating gradient noise through adaptive moments offers an effective way to improve alignment.

Adaptive Moments are Surprisingly Effective for Plug-and-Play Diffusion Sampling

Abstract

Guided diffusion sampling relies on approximating often intractable likelihood scores, which introduces significant noise into the sampling dynamics. We propose using adaptive moment estimation to stabilize these noisy likelihood scores during sampling. Despite its simplicity, our approach achieves state-of-the-art results on image restoration and class-conditional generation tasks, outperforming more complicated methods, which are often computationally more expensive. We provide empirical analysis of our method on both synthetic and real data, demonstrating that mitigating gradient noise through adaptive moments offers an effective way to improve alignment.
Paper Structure (11 sections, 11 equations, 18 figures, 9 tables, 5 algorithms)

This paper contains 11 sections, 11 equations, 18 figures, 9 tables, 5 algorithms.

Figures (18)

  • Figure 1: Left: The KL divergence between each method's empirical distribution and the target distribution as a function of the guidance noise coefficient $\zeta$. Right: Visualization of the empirical and target distributions at $\zeta=0.175$.
  • Figure 2: Qualitative comparison of AdamDPS, DPS, and TFG on Cats dataset for super resolution at 12x downsampling and Gaussian deblurring at blur intensity 9.
  • Figure 3: Reconstruction performance measured in LPIPS and FID, where lower is better for both. Comparison on ImageNet and Cats dataset for super resolution at 16x downsampling, Gaussian deblurring at blur intensity 12, and inpainting with a 90% random mask.
  • Figure 4: Class-conditional sampling performance measured in classification accuracy and FID, where higher accuracy and lower FID is better. Accuracy is computed as the harmonic mean across three held-out classifiers. Left & Center: Comparison of plug-and-play methods with a standard classifier on CIFAR-10 and ImageNet, respectively. Right: Comparison of plug-and-play methods with a time-aware classifier on ImageNet.
  • Figure 5: Relative improvement over DPS on ImageNet as task difficulty increases for super resolution and Gaussian deblurring.
  • ...and 13 more figures