Table of Contents
Fetching ...

Instance-dependent Convergence Theory for Diffusion Models

Yuchen Jiao, Gen Li

TL;DR

This work analyzes diffusion-model samplers under a relaxed, instance-dependent smoothness condition and proves a TV-convergence rate that adapts to the non-uniform Lipschitz constant $L$ of the score functions. By introducing a randomized midpoint sampling scheme and a set of auxiliary processes, the authors derive an $L$-adaptive iteration bound of $\min\{d,d^{2/3}L^{1/3},d^{1/3}L\}\varepsilon^{-2/3}$ (up to logs), valid for general target distributions and particularly favorable for Gaussian mixtures where $L$ scales only logarithmically. The analysis also covers parallel sampling, providing practical guidance on processor counts and rounds to achieve $\varepsilon$-accuracy in TV distance. Overall, the results advance theoretical understanding of diffusion samplers by enabling robust performance guarantees across a broad range of target distributions with weaker smoothness assumptions. The techniques combine probability-flow ODE discretization, KL-based error control, and typical-set arguments to handle non-uniform Lipschitz smoothness, with potential impact on scalable generative modeling and algorithm design.

Abstract

Score-based diffusion models have demonstrated outstanding empirical performance in machine learning and artificial intelligence, particularly in generating high-quality new samples from complex probability distributions. Improving the theoretical understanding of diffusion models, with a particular focus on the convergence analysis, has attracted significant attention. In this work, we develop a convergence rate that is adaptive to the smoothness of different target distributions, referred to as instance-dependent bound. Specifically, we establish an iteration complexity of $\min\{d,d^{2/3}L^{1/3},d^{1/3}L\}\varepsilon^{-2/3}$ (up to logarithmic factors), where $d$ denotes the data dimension, and $\varepsilon$ quantifies the output accuracy in terms of total variation (TV) distance. In addition, $L$ represents a relaxed Lipschitz constant, which, in the case of Gaussian mixture models, scales only logarithmically with the number of components, the dimension and iteration number, demonstrating broad applicability.

Instance-dependent Convergence Theory for Diffusion Models

TL;DR

This work analyzes diffusion-model samplers under a relaxed, instance-dependent smoothness condition and proves a TV-convergence rate that adapts to the non-uniform Lipschitz constant of the score functions. By introducing a randomized midpoint sampling scheme and a set of auxiliary processes, the authors derive an -adaptive iteration bound of (up to logs), valid for general target distributions and particularly favorable for Gaussian mixtures where scales only logarithmically. The analysis also covers parallel sampling, providing practical guidance on processor counts and rounds to achieve -accuracy in TV distance. Overall, the results advance theoretical understanding of diffusion samplers by enabling robust performance guarantees across a broad range of target distributions with weaker smoothness assumptions. The techniques combine probability-flow ODE discretization, KL-based error control, and typical-set arguments to handle non-uniform Lipschitz smoothness, with potential impact on scalable generative modeling and algorithm design.

Abstract

Score-based diffusion models have demonstrated outstanding empirical performance in machine learning and artificial intelligence, particularly in generating high-quality new samples from complex probability distributions. Improving the theoretical understanding of diffusion models, with a particular focus on the convergence analysis, has attracted significant attention. In this work, we develop a convergence rate that is adaptive to the smoothness of different target distributions, referred to as instance-dependent bound. Specifically, we establish an iteration complexity of (up to logarithmic factors), where denotes the data dimension, and quantifies the output accuracy in terms of total variation (TV) distance. In addition, represents a relaxed Lipschitz constant, which, in the case of Gaussian mixture models, scales only logarithmically with the number of components, the dimension and iteration number, demonstrating broad applicability.

Paper Structure

This paper contains 35 sections, 15 theorems, 249 equations, 3 figures.

Key Result

Theorem 1

Suppose that Assumptions assu:distribution and assu:score-error hold true, and $K = c_2\min\{d\log^2 T, L\log T\}$ for some constant $c_2 > 0$. Then the sampling process eq:sampler with the learning rate schedule eq:learning-rate satisfies for some constant $C > 0$ large enough, where $L$ is defined in Definition def:score-lipschitz.

Figures (3)

  • Figure 1: Comparison of Theorem \ref{['thm:main']} with prior results. left: the iteration complexity as a function of $L$ when $\varepsilon = O(1)$. right: the iteration complexity as a function of $\varepsilon$ when $L=\infty$.
  • Figure 2: Sampling error of the proposed sampler and fitted rate $T \rightarrow \Theta(\log^4 T/T^3)$: (a) $d = 10, k = 10$; (b) $d = 100, k = 10$; (c) $d = 500, k = 100$.
  • Figure 3: TV distance $\varepsilon$ achieved by Theorem \ref{['thm:main']} and previous results with left: $T = O(d)$; middle: $T=O(d^{3/2})$; right: $T=O(d^2)$.

Theorems & Definitions (21)

  • Definition 1
  • Definition 2: Non-uniform Lipschitz property
  • Remark 1
  • Example 1: Gaussian distribution
  • Example 2: Gaussian Mixture Models
  • Theorem 1
  • Remark 2
  • Theorem 2
  • Lemma 1
  • Lemma 2
  • ...and 11 more