Improved Convergence of Score-Based Diffusion Models via Prediction-Correction
Francesco Pedrotti, Jan Maas, Marco Mondelli
TL;DR
This work tackles the theoretical bottleneck in score-based diffusion models where convergence guarantees traditionally require letting the forward-time perturbation $T_1$ go to infinity. It introduces a finite-time predictor-corrector scheme: run the forward OU process to a fixed $T_1$, perform an inexact Langevin correction to sample from $p_{T_1}$ using the estimated score $s_\theta$, and then execute a deterministic reverse using the learned score to recover $p$. The authors establish Wasserstein convergence bounds $W_2(p,p_\theta)$ that depend only logarithmically on the ambient dimension and the subgaussian norm of the data, under mild assumptions and using only the integrated $L^2$ score loss; they also develop a mechanism to bound a stronger tail loss $\epsilon_{MGF}$ via a truncation strategy. The paper further shows discretized-scheme convergence in total variation, highlighting practical stability and reduced computational cost, and contrasts these results with prior work that required $T_1 \to \infty$ or imposed stronger score-estimation assumptions. Overall, the results provide a principled, scalable pathway to reliable sampling with SGMs by fixing forward-time duration and leveraging a two-stage correction process.
Abstract
Score-based generative models (SGMs) are powerful tools to sample from complex data distributions. Their underlying idea is to (i) run a forward process for time $T_1$ by adding noise to the data, (ii) estimate its score function, and (iii) use such estimate to run a reverse process. As the reverse process is initialized with the stationary distribution of the forward one, the existing analysis paradigm requires $T_1\to\infty$. This is however problematic: from a theoretical viewpoint, for a given precision of the score approximation, the convergence guarantee fails as $T_1$ diverges; from a practical viewpoint, a large $T_1$ increases computational costs and leads to error propagation. This paper addresses the issue by considering a version of the popular predictor-corrector scheme: after running the forward process, we first estimate the final distribution via an inexact Langevin dynamics and then revert the process. Our key technical contribution is to provide convergence guarantees which require to run the forward process only for a fixed finite time $T_1$. Our bounds exhibit a mild logarithmic dependence on the input dimension and the subgaussian norm of the target distribution, have minimal assumptions on the data, and require only to control the $L^2$ loss on the score approximation, which is the quantity minimized in practice.
