Instance-dependent Convergence Theory for Diffusion Models
Yuchen Jiao, Gen Li
TL;DR
This work analyzes diffusion-model samplers under a relaxed, instance-dependent smoothness condition and proves a TV-convergence rate that adapts to the non-uniform Lipschitz constant $L$ of the score functions. By introducing a randomized midpoint sampling scheme and a set of auxiliary processes, the authors derive an $L$-adaptive iteration bound of $\min\{d,d^{2/3}L^{1/3},d^{1/3}L\}\varepsilon^{-2/3}$ (up to logs), valid for general target distributions and particularly favorable for Gaussian mixtures where $L$ scales only logarithmically. The analysis also covers parallel sampling, providing practical guidance on processor counts and rounds to achieve $\varepsilon$-accuracy in TV distance. Overall, the results advance theoretical understanding of diffusion samplers by enabling robust performance guarantees across a broad range of target distributions with weaker smoothness assumptions. The techniques combine probability-flow ODE discretization, KL-based error control, and typical-set arguments to handle non-uniform Lipschitz smoothness, with potential impact on scalable generative modeling and algorithm design.
Abstract
Score-based diffusion models have demonstrated outstanding empirical performance in machine learning and artificial intelligence, particularly in generating high-quality new samples from complex probability distributions. Improving the theoretical understanding of diffusion models, with a particular focus on the convergence analysis, has attracted significant attention. In this work, we develop a convergence rate that is adaptive to the smoothness of different target distributions, referred to as instance-dependent bound. Specifically, we establish an iteration complexity of $\min\{d,d^{2/3}L^{1/3},d^{1/3}L\}\varepsilon^{-2/3}$ (up to logarithmic factors), where $d$ denotes the data dimension, and $\varepsilon$ quantifies the output accuracy in terms of total variation (TV) distance. In addition, $L$ represents a relaxed Lipschitz constant, which, in the case of Gaussian mixture models, scales only logarithmically with the number of components, the dimension and iteration number, demonstrating broad applicability.
