Table of Contents
Fetching ...

Optimizing Few-Step Sampler for Diffusion Probabilistic Model

Jen-Yuan Huang

TL;DR

An upper bound for the discretization error of the sampling schedule is derived, which can be efficiently optimized with Monte-Carlo estimation and purpose a two-phase alternating optimization algorithm.

Abstract

Diffusion Probabilistic Models (DPMs) have demonstrated exceptional capability of generating high-quality and diverse images, but their practical application is hindered by the intensive computational cost during inference. The DPM generation process requires solving a Probability-Flow Ordinary Differential Equation (PF-ODE), which involves discretizing the integration domain into intervals for numerical approximation. This corresponds to the sampling schedule of a diffusion ODE solver, and we notice the solution from a first-order solver can be expressed as a convex combination of model outputs at all scheduled time-steps. We derive an upper bound for the discretization error of the sampling schedule, which can be efficiently optimized with Monte-Carlo estimation. Building on these theoretical results, we purpose a two-phase alternating optimization algorithm. In Phase-1, the sampling schedule is optimized for the pre-trained DPM; in Phase-2, the DPM further tuned on the selected time-steps. Experiments on a pre-trained DPM for ImageNet64 dataset demonstrate the purposed method consistently improves the baseline across various number of sampling steps.

Optimizing Few-Step Sampler for Diffusion Probabilistic Model

TL;DR

An upper bound for the discretization error of the sampling schedule is derived, which can be efficiently optimized with Monte-Carlo estimation and purpose a two-phase alternating optimization algorithm.

Abstract

Diffusion Probabilistic Models (DPMs) have demonstrated exceptional capability of generating high-quality and diverse images, but their practical application is hindered by the intensive computational cost during inference. The DPM generation process requires solving a Probability-Flow Ordinary Differential Equation (PF-ODE), which involves discretizing the integration domain into intervals for numerical approximation. This corresponds to the sampling schedule of a diffusion ODE solver, and we notice the solution from a first-order solver can be expressed as a convex combination of model outputs at all scheduled time-steps. We derive an upper bound for the discretization error of the sampling schedule, which can be efficiently optimized with Monte-Carlo estimation. Building on these theoretical results, we purpose a two-phase alternating optimization algorithm. In Phase-1, the sampling schedule is optimized for the pre-trained DPM; in Phase-2, the DPM further tuned on the selected time-steps. Experiments on a pre-trained DPM for ImageNet64 dataset demonstrate the purposed method consistently improves the baseline across various number of sampling steps.

Paper Structure

This paper contains 20 sections, 1 theorem, 14 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

For $\boldsymbol{\sigma^*}=\mathop{\arg\min}_{\boldsymbol{\sigma}}\mathcal{L}_{disc}\left( \boldsymbol{\sigma} \right)$, the backward difference of the $i$-th element in sequence $\left\{\boldsymbol{\tau}_{\sigma^*_i}\left(\boldsymbol{x}_{0}\right)\right\}$ equals to the first-order increment in $\b

Figures (4)

  • Figure 1: Illustration of the optimal sampling schedule (blue line). The iteration process is following backward Euler-steps, leading to faster converge with low global-error. While the fixed sampling schedule is either sub-optimal, or take equal step-sizes in sampling.
  • Figure 2: Finetune improvement. (Left) finetuning experiments conducted on pretrained FFHQ model at 64$\times$64 resolotion and CIFAR-10 model at 32$\times$32 resolotion (Right). Our finetuning can achieive consistent improvements on few-steps sampling.
  • Figure 3: Sampling schedule. The learned sampling schedule of our method drastically reduces the noise level, skipping sampling steps where little content is generated and allocating computational resources to steps that lead to diverse perceptual details.
  • Figure 4: Weighting schemes. (Left) the explicit weighting scheme $\lambda_i$ in training objective Eq. \ref{['diffusion_loss', 'loss']}. (Right) the active weighting scheme by including the likelihood term in Monte-Carlo estimating Eq. \ref{['diffusion_loss', 'loss']}. The weighting scheme induced by our learned sampling schedule exhibits a same pattern as the empirically chosen one in EDMkarras2022elucidating which leads to better results in experiments.

Theorems & Definitions (1)

  • Theorem 1