DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

Wenliang Zhao; Haolin Wang; Jie Zhou; Jiwen Lu

DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

Wenliang Zhao, Haolin Wang, Jie Zhou, Jiwen Lu

TL;DR

A new fast DPM sampler called DC-Solver, which leverages dynamic compensation (DC) to mitigate the misalignment of the predictor-corrector samplers and proposes a cascade polynomial regression (CPR) which can instantly predict the compensation ratios on unseen sampling configurations.

Abstract

Diffusion probabilistic models (DPMs) have shown remarkable performance in visual synthesis but are computationally expensive due to the need for multiple evaluations during the sampling. Recent predictor-corrector diffusion samplers have significantly reduced the required number of function evaluations (NFE), but inherently suffer from a misalignment issue caused by the extra corrector step, especially with a large classifier-free guidance scale (CFG). In this paper, we introduce a new fast DPM sampler called DC-Solver, which leverages dynamic compensation (DC) to mitigate the misalignment of the predictor-corrector samplers. The dynamic compensation is controlled by compensation ratios that are adaptive to the sampling steps and can be optimized on only 10 datapoints by pushing the sampling trajectory toward a ground truth trajectory. We further propose a cascade polynomial regression (CPR) which can instantly predict the compensation ratios on unseen sampling configurations. Additionally, we find that the proposed dynamic compensation can also serve as a plug-and-play module to boost the performance of predictor-only samplers. Extensive experiments on both unconditional sampling and conditional sampling demonstrate that our DC-Solver can consistently improve the sampling quality over previous methods on different DPMs with a wide range of resolutions up to 1024$\times$1024. Notably, we achieve 10.38 FID (NFE=5) on unconditional FFHQ and 0.394 MSE (NFE=5, CFG=7.5) on Stable-Diffusion-2.1. Code is available at https://github.com/wl-zhao/DC-Solver

DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

TL;DR

Abstract

1024. Notably, we achieve 10.38 FID (NFE=5) on unconditional FFHQ and 0.394 MSE (NFE=5, CFG=7.5) on Stable-Diffusion-2.1. Code is available at https://github.com/wl-zhao/DC-Solver

Paper Structure (24 sections, 5 theorems, 53 equations, 9 figures, 7 tables, 2 algorithms)

This paper contains 24 sections, 5 theorems, 53 equations, 9 figures, 7 tables, 2 algorithms.

Introduction
Related Work
Method
Preliminaries: Fast Sampling of DPMs
Better Alignment via Dynamic Compensation
Generalization to Unseen NFE & CFG
Discussion
Experiments
Implementation Details
Main Results
Ablation study
More Analyses
Conclusions
Detailed Background of Diffusion Models
Diffusion Models
...and 9 more sections

Key Result

theorem 4

For any DPM sampler of $p+1$-th order of accuracy, i.e., $\mathbb{E}\|\tilde{\boldsymbol x}_{t_{i+1}}^{\rm c} - \tilde{\boldsymbol x}_{t_{i+1}}\|_2 \le C h_i^{p+2}$, applying dynamic compensation with the ratio $\rho_i^*$ will reduce the local truncation error and remain the $p+1$-th order of accura

Figures (9)

Figure 1: The main idea of DC-Solver.(a) Searching. We propose dynamic compensation (DC) to mitigate the misalignment issue in the predictor-corrector diffusion sampler. The compensation is controlled by the ratios $\{\rho_i\}$ which are adaptive to the sampling step and can be optimized by pushing the sampling trajectory toward the ground truth trajectory on only 10 datapoints. (b) Sampling. The compensation ratios can be either efficiently searched as in (a) or instantly predicted by the cascade polynomial regression (CPR) given the desired NFE and CFG.
Figure 2: Qualitative comparisons on Stable-Diffusion-2.1. Images above are sampled from SD2.1 (768$\times$768) using the text prompt "A photo of a serene coastal cliff with waves crashing against the rocks below" with a classifier-free guidance scale of 7.5 and only 5 number of function evaluations (NFE). We provide the generated images from 4 random initial noises for each method. We show that DC-Solver is able to generate high-resolution and photo-realistic images with more details. Best viewed in color.
Figure 3: Relationship between compensation ratios and CFG/NFE. We adopt the widely used Stable-Diffusion-1.5 rombach2022high and search for the optimal compensation ratios for different CFG and NFE and find that the compensation ratios evolve continuously with the variations in CFG/NFE.
Figure 4: Unconditional sampling results. We compare our DC-Solver with previous methods on FFHQ karras2019ffhq, LSUN-Church yu2015lsun, and LSUN-Bedroom yu2015lsun. The FID$\downarrow$ on different numbers of function evaluations (NFE) is used to measure the sampling quality. We show that DC-Solver significantly outperforms other methods, especially with few NFE.
Figure 5: Conditional sampling results. We compare the sampling quality of different methods using the Stable-Diffusion-1.5 with classifier-free guidance (CFG) varying from 1.5 to 7.5. The sampling quality is measured by the mean squared error (MSE$\downarrow$) between the generated latents and the ground truth latents obtained by a 999-step DDIM. We randomly select 10K captions from MS-COCO2014 as the text prompts. We observe that DC-Solver consistently achieves better sampling quality on different NFE/CFG.
...and 4 more figures

Theorems & Definitions (10)

theorem 4
proof
corollary thmcountercorollary
proof
theorem 5
proof
corollary thmcountercorollary
proof
theorem 6
proof

DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

TL;DR

Abstract

DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (10)