Elucidating the solution space of extended reverse-time SDE for diffusion models

Qinpeng Cui; Xinyi Zhang; Qiqi Bao; Qingmin Liao

Elucidating the solution space of extended reverse-time SDE for diffusion models

Qinpeng Cui, Xinyi Zhang, Qiqi Bao, Qingmin Liao

TL;DR

This paper addresses the speed–quality dilemma in diffusion-model sampling by unifying ODE and SDE approaches under an Extended Reverse-Time SDE (ER SDE) framework. It uncovers a semi-linear structure that yields exact solutions for VE SDE and practical approximations for VP SDE, and introduces the concept of one-step prediction errors to explain why ODE solvers excel in low-NFE regimes while SDE solvers excel as NFE grows. By exploiting the ER SDE solution space through carefully chosen noise-scale functions $\phi(\cdot)$, the authors design ER-SDE-Solvers that realize rapid, high-quality sampling and demonstrate state-of-the-art performance among training-free stochastic samplers (e.g., on ImageNet $128\times128$ with $NFE=20$). The results show a practical path to deploy fast yet high-fidelity diffusion-based generation, with classifier guidance further boosting efficiency at higher resolutions. This work advances both theory and practice by connecting ODE/SDE dynamics, enabling versatile solvers, and offering concrete guidelines for noise-schedule design in large-scale diffusion models.

Abstract

Sampling from Diffusion Models can alternatively be seen as solving differential equations, where there is a challenge in balancing speed and image visual quality. ODE-based samplers offer rapid sampling time but reach a performance limit, whereas SDE-based samplers achieve superior quality, albeit with longer iterations. In this work, we formulate the sampling process as an Extended Reverse-Time SDE (ER SDE), unifying prior explorations into ODEs and SDEs. Theoretically, leveraging the semi-linear structure of ER SDE solutions, we offer exact solutions and approximate solutions for VP SDE and VE SDE, respectively. Based on the approximate solution space of the ER SDE, referred to as one-step prediction errors, we yield mathematical insights elucidating the rapid sampling capability of ODE solvers and the high-quality sampling ability of SDE solvers. Additionally, we unveil that VP SDE solvers stand on par with their VE SDE counterparts. Based on these findings, leveraging the dual advantages of ODE solvers and SDE solvers, we devise efficient high-quality samplers, namely ER-SDE-Solvers. Experimental results demonstrate that ER-SDE-Solvers achieve state-of-the-art performance across all stochastic samplers while maintaining efficiency of deterministic samplers. Specifically, on the ImageNet $128\times128$ dataset, ER-SDE-Solvers obtain 8.33 FID in only 20 function evaluations. Code is available at \href{https://github.com/QinpengCui/ER-SDE-Solver}{https://github.com/QinpengCui/ER-SDE-Solver}

Elucidating the solution space of extended reverse-time SDE for diffusion models

TL;DR

, the authors design ER-SDE-Solvers that realize rapid, high-quality sampling and demonstrate state-of-the-art performance among training-free stochastic samplers (e.g., on ImageNet

with

). The results show a practical path to deploy fast yet high-fidelity diffusion-based generation, with classifier guidance further boosting efficiency at higher resolutions. This work advances both theory and practice by connecting ODE/SDE dynamics, enabling versatile solvers, and offering concrete guidelines for noise-schedule design in large-scale diffusion models.

Abstract

dataset, ER-SDE-Solvers obtain 8.33 FID in only 20 function evaluations. Code is available at \href{https://github.com/QinpengCui/ER-SDE-Solver}{https://github.com/QinpengCui/ER-SDE-Solver}

Paper Structure (36 sections, 5 theorems, 82 equations, 16 figures, 7 tables, 6 algorithms)

This paper contains 36 sections, 5 theorems, 82 equations, 16 figures, 7 tables, 6 algorithms.

Introduction
Diffusion Models
Forward Diffusion SDEs
Reverse Diffusion SDEs
Reverse Diffusion ODEs
Extended Reverse-Time SDE Solvers
Extended Reverse-Time SDE
VE ER-SDE-Solvers
VP ER-SDE-Solvers
Elucidating the Solution Space of ER SDE
Insights about the Solution Space of ER SDE
Customized Efficient High-Quality ER-SDE-Solvers
Experiments
Different Stages of VE and VP ER-SDE-Solvers
Comparisons with Other Training-Free Methods
...and 21 more sections

Key Result

Proposition 3.1

When $\boldsymbol{s}_{\theta}(\boldsymbol{x}_{t}, t)=\nabla_{\boldsymbol{x}} \log p_{t}\left(\boldsymbol{x}_{t}\right)$ for all $\boldsymbol{x}_{t}$, $\overline p_{T}\left(\boldsymbol{x}_{T}\right)=p_{T}\left(\boldsymbol{x}_{T}\right)$, the marginal distribution $\overline p_t(\boldsymbol{x}_{t})$ o

Figures (16)

Figure 1: Sample quality (measured by FID$\downarrow$) on ImageNet $64\times64$ versus number of function evaluations (NFE) for deterministic samplers (DDIM song2021denoising, EDM-Deterministic karras2022elucidating, DPM-Solver-3 lu2022dpm) and stochastic samplers (DDIM($\eta=1)$, EDM-Stochastic, Ours). Deterministic samplers excel in achieving rapid sampling but reach a mediocre quality with a large NFE, while stochastic samplers can further enhance image quality with an increase in NFE. Our efficient high-quality samplers demonstrate state-of-the-art performance among all stochastic samplers, simultaneously maintaining sampling efficiency comparable to deterministic samplers.
Figure 2: A unified framework for DMs: The forward process described by an SDE transforms real data into noise, while the reverse process characterized by an ER SDE generates real data from noise. Once the score function $\nabla_{\mathbf{x}} \log p_t(\mathbf{x}_t)$ is estimated by a neural network, solving the ER SDE enables the generation of high-quality samples.
Figure 3: FIE coefficients (a) and FID scores (b) versus NFE for distinct noise scale functions. 1st-order solver is used here with the pretrained EDM. In the solution space of ER SDE, ODE solver shows minimal one-step prediction errors. ER SDE 4 demonstrates elevated error in the initial 100 NFE and gradually converges to the ODE's error profile. Thus, ER SDE 4 exhibit comparable efficiency to ODE solver but can further generate high-quality images. Image quality deteriorates for ill-suited noise scale functions (like ER SDE 2).
Figure 4: FID (NFE=20) on CIFAR-10 with the pretrained EDM, varying with the number of integration points. As the number of integration points $N$ increases, FID scores initially show a decreasing trend, reaching a minimum at $N=200$. Subsequently, FID scores slowly increase, indicating a decrease in image generation quality.
Figure 5: FIE coefficients with the pretrained model EDM (a) and Guided-diffusion (b) (linear noise schedule), varying with NFE. Different step size schedules and noise schedules used in the pretrained models lead to variations in the shape of the FIE-NFE curves.
...and 11 more figures

Theorems & Definitions (5)

Proposition 3.1: The validity of the ER SDE, proof in Supp.1.1
Proposition 3.2: Exact solution of the VE SDE, proof in Supp.1.2
Proposition 3.3: High-stage approximations of the VE SDE, proof in Supp.1.3
Proposition 3.4: Exact solution of the VP SDE, proof in Supp.1.4
Proposition 3.5: High-stage approximations of the VP SDE, proof in Supp.1.5

Elucidating the solution space of extended reverse-time SDE for diffusion models

TL;DR

Abstract

Elucidating the solution space of extended reverse-time SDE for diffusion models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (5)