Table of Contents
Fetching ...

ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models

Shengming Li, Luping Liu, Runnan Li, Xu Tan

TL;DR

ERA-Solver tackles slow diffusion-model sampling by casting denoising as a diffusion ODE and introducing an error-robust implicit Adams solver. It replaces fixed-predictor coefficients with a Lagrange interpolation-based predictor and an adaptive noise-basis selection mechanism to accommodate diverse error patterns from pretrained models, all without retraining. The authors show third-order local error and second-order convergence for suitable interpolation order, plus robustness guarantees for basis selection, and demonstrate substantial FID improvements on CIFAR-10, CelebA, LSUN-Church, and ImageNet 64×64 with as few as 10 function evaluations. This training-free, model-agnostic approach significantly accelerates diffusion-based generation while preserving or enhancing sample quality, enabling practical deployment across tasks.

Abstract

Though denoising diffusion probabilistic models (DDPMs) have achieved remarkable generation results, the low sampling efficiency of DDPMs still limits further applications. Since DDPMs can be formulated as diffusion ordinary differential equations (ODEs), various fast sampling methods can be derived from solving diffusion ODEs. However, we notice that previous fast sampling methods with fixed analytical form are not able to robust with the various error patterns in the noise estimated from pretrained diffusion models. In this work, we construct an error-robust Adams solver (ERA-Solver), which utilizes the implicit Adams numerical method that consists of a predictor and a corrector. Different from the traditional predictor based on explicit Adams methods, we leverage a Lagrange interpolation function as the predictor, which is further enhanced with an error-robust strategy to adaptively select the Lagrange bases with lower errors in the estimated noise. The proposed solver can be directly applied to any pretrained diffusion models, without extra training. Experiments on Cifar10, CelebA, LSUN-Church, and ImageNet 64 x 64 (conditional) datasets demonstrate that our proposed ERA-Solver achieves 3.54, 5.06, 5.02, and 5.11 Frechet Inception Distance (FID) for image generation, with only 10 network evaluations.

ERA-Solver: Error-Robust Adams Solver for Fast Sampling of Diffusion Probabilistic Models

TL;DR

ERA-Solver tackles slow diffusion-model sampling by casting denoising as a diffusion ODE and introducing an error-robust implicit Adams solver. It replaces fixed-predictor coefficients with a Lagrange interpolation-based predictor and an adaptive noise-basis selection mechanism to accommodate diverse error patterns from pretrained models, all without retraining. The authors show third-order local error and second-order convergence for suitable interpolation order, plus robustness guarantees for basis selection, and demonstrate substantial FID improvements on CIFAR-10, CelebA, LSUN-Church, and ImageNet 64×64 with as few as 10 function evaluations. This training-free, model-agnostic approach significantly accelerates diffusion-based generation while preserving or enhancing sample quality, enabling practical deployment across tasks.

Abstract

Though denoising diffusion probabilistic models (DDPMs) have achieved remarkable generation results, the low sampling efficiency of DDPMs still limits further applications. Since DDPMs can be formulated as diffusion ordinary differential equations (ODEs), various fast sampling methods can be derived from solving diffusion ODEs. However, we notice that previous fast sampling methods with fixed analytical form are not able to robust with the various error patterns in the noise estimated from pretrained diffusion models. In this work, we construct an error-robust Adams solver (ERA-Solver), which utilizes the implicit Adams numerical method that consists of a predictor and a corrector. Different from the traditional predictor based on explicit Adams methods, we leverage a Lagrange interpolation function as the predictor, which is further enhanced with an error-robust strategy to adaptively select the Lagrange bases with lower errors in the estimated noise. The proposed solver can be directly applied to any pretrained diffusion models, without extra training. Experiments on Cifar10, CelebA, LSUN-Church, and ImageNet 64 x 64 (conditional) datasets demonstrate that our proposed ERA-Solver achieves 3.54, 5.06, 5.02, and 5.11 Frechet Inception Distance (FID) for image generation, with only 10 network evaluations.
Paper Structure (41 sections, 2 theorems, 31 equations, 10 figures, 12 tables, 1 algorithm)

This paper contains 41 sections, 2 theorems, 31 equations, 10 figures, 12 tables, 1 algorithm.

Key Result

Theorem 1

When $k \geq 3$, ERA-Solver has a third-order local approximation error and a second-order convergence.

Figures (10)

  • Figure 1: Generated samples of ERA-Solver and previous fast sampling methods on text-to-image latent diffusion model stabledif and unconditional pixel-space diffusion model dhariwal2021diffusion.
  • Figure 2: (left) The main idea of ERA-Solver. ERA-Solver allows flexible sampling coefficients in a unified numerical solver to be error-robust to the various error patterns on different data manifolds. (right) The visualization of errors between the estimated noise and ground-truth noise on various data manifolds with the same pretrained diffusion modelddim. The red bar means the statistical variance.
  • Figure 3: The pipeline of ERA-Solver. The sampling scheme is based on the predictor-corrector method for implicit Adams. Our predictor is robust to the errors of the estimated noises from pretrained models. The sampling starts from normal Gaussian noise $x_{t_0}$ and performs a denoising scheme (from $x_{t_i}$ to $x_{t_{i+1}}$) iteratively to get the final generated image.
  • Figure 4: $\Delta {\boldsymbol{\epsilon}}$ comparison of the error-robust selection process and fixed selection process. $\Delta \epsilon$ is calculated based on Eq. \ref{['eq:error_measure']} instead of the training loss in Eq. \ref{['eq:dsm_loss']} on Cifar10 cifar10. The sampling NFE is set to 20 and $k$ is set to 5.
  • Figure 5: Generation quality measured by FID $\downarrow$ on various datasets and pretrained DPMs, varying the number of function evaluation (NFE).
  • ...and 5 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2