Table of Contents
Fetching ...

Improved Convergence of Score-Based Diffusion Models via Prediction-Correction

Francesco Pedrotti, Jan Maas, Marco Mondelli

TL;DR

This work tackles the theoretical bottleneck in score-based diffusion models where convergence guarantees traditionally require letting the forward-time perturbation $T_1$ go to infinity. It introduces a finite-time predictor-corrector scheme: run the forward OU process to a fixed $T_1$, perform an inexact Langevin correction to sample from $p_{T_1}$ using the estimated score $s_\theta$, and then execute a deterministic reverse using the learned score to recover $p$. The authors establish Wasserstein convergence bounds $W_2(p,p_\theta)$ that depend only logarithmically on the ambient dimension and the subgaussian norm of the data, under mild assumptions and using only the integrated $L^2$ score loss; they also develop a mechanism to bound a stronger tail loss $\epsilon_{MGF}$ via a truncation strategy. The paper further shows discretized-scheme convergence in total variation, highlighting practical stability and reduced computational cost, and contrasts these results with prior work that required $T_1 \to \infty$ or imposed stronger score-estimation assumptions. Overall, the results provide a principled, scalable pathway to reliable sampling with SGMs by fixing forward-time duration and leveraging a two-stage correction process.

Abstract

Score-based generative models (SGMs) are powerful tools to sample from complex data distributions. Their underlying idea is to (i) run a forward process for time $T_1$ by adding noise to the data, (ii) estimate its score function, and (iii) use such estimate to run a reverse process. As the reverse process is initialized with the stationary distribution of the forward one, the existing analysis paradigm requires $T_1\to\infty$. This is however problematic: from a theoretical viewpoint, for a given precision of the score approximation, the convergence guarantee fails as $T_1$ diverges; from a practical viewpoint, a large $T_1$ increases computational costs and leads to error propagation. This paper addresses the issue by considering a version of the popular predictor-corrector scheme: after running the forward process, we first estimate the final distribution via an inexact Langevin dynamics and then revert the process. Our key technical contribution is to provide convergence guarantees which require to run the forward process only for a fixed finite time $T_1$. Our bounds exhibit a mild logarithmic dependence on the input dimension and the subgaussian norm of the target distribution, have minimal assumptions on the data, and require only to control the $L^2$ loss on the score approximation, which is the quantity minimized in practice.

Improved Convergence of Score-Based Diffusion Models via Prediction-Correction

TL;DR

This work tackles the theoretical bottleneck in score-based diffusion models where convergence guarantees traditionally require letting the forward-time perturbation go to infinity. It introduces a finite-time predictor-corrector scheme: run the forward OU process to a fixed , perform an inexact Langevin correction to sample from using the estimated score , and then execute a deterministic reverse using the learned score to recover . The authors establish Wasserstein convergence bounds that depend only logarithmically on the ambient dimension and the subgaussian norm of the data, under mild assumptions and using only the integrated score loss; they also develop a mechanism to bound a stronger tail loss via a truncation strategy. The paper further shows discretized-scheme convergence in total variation, highlighting practical stability and reduced computational cost, and contrasts these results with prior work that required or imposed stronger score-estimation assumptions. Overall, the results provide a principled, scalable pathway to reliable sampling with SGMs by fixing forward-time duration and leveraging a two-stage correction process.

Abstract

Score-based generative models (SGMs) are powerful tools to sample from complex data distributions. Their underlying idea is to (i) run a forward process for time by adding noise to the data, (ii) estimate its score function, and (iii) use such estimate to run a reverse process. As the reverse process is initialized with the stationary distribution of the forward one, the existing analysis paradigm requires . This is however problematic: from a theoretical viewpoint, for a given precision of the score approximation, the convergence guarantee fails as diverges; from a practical viewpoint, a large increases computational costs and leads to error propagation. This paper addresses the issue by considering a version of the popular predictor-corrector scheme: after running the forward process, we first estimate the final distribution via an inexact Langevin dynamics and then revert the process. Our key technical contribution is to provide convergence guarantees which require to run the forward process only for a fixed finite time . Our bounds exhibit a mild logarithmic dependence on the input dimension and the subgaussian norm of the target distribution, have minimal assumptions on the data, and require only to control the loss on the score approximation, which is the quantity minimized in practice.
Paper Structure (22 sections, 20 theorems, 118 equations, 2 figures, 1 table)

This paper contains 22 sections, 20 theorems, 118 equations, 2 figures, 1 table.

Key Result

Theorem 4.1

Let Assumptions it:first-ass-it:last-ass hold, and let $p_t$ be obtained via the forward OU process in eq:Ornstein-Uhlenbeck. Pick $0<\delta < 1$, $T_2>0$, $T_1\geq \frac{1}{2}\log(*){2+172\frac{\lVert X\rVert_{\mathrm{SG}}^2}{\delta} +\frac{d}{2\delta}}$ and a small early stopping time $0<\tau\leq Then, the distance between the output $p_\theta$ and the target distribution $p$ can be bounded as

Figures (2)

  • Figure 1: Simulation results for an asymmetric mixture of two gaussians (left), the two moons dataset (center) and the rescaled swiss roll (right). For fixed $T_1$ and variable $T_2$, we plot in blue the $W_2$ distance between the perturbed measure $p_{T_1}$ and the output of \ref{['eq:approx-lang']}, while we plot in orange the $W_2$ distance between the true distribution $p$ and the output of the algorithm $p_\theta$. As expected, both quickly decrease as $T_2$ increases.
  • Figure 2: The confinement region for $\nabla \log p_{T_1}$.

Theorems & Definitions (39)

  • Theorem 4.1
  • Theorem 4.2
  • Proposition 4.3
  • Lemma 4.4
  • Definition 4.5
  • Remark 4.6
  • Theorem 4.7
  • Lemma 4.8
  • Theorem 5.1
  • Remark 5.2
  • ...and 29 more