Table of Contents
Fetching ...

Initialization-Aware Score-Based Diffusion Sampling

Tiziano Fassina, Gabriel Cardoso, Sylvan Le Corff, Thomas Romary

TL;DR

This work presents a Kullback-Leibler convergence analysis of Variance Exploding diffusion samplers and proposes a theoretically grounded sampling strategy that learns the reverse-time initialization, directly minimizing the initialization error.

Abstract

Score-based generative models (SGMs) aim at generating samples from a target distribution by approximating the reverse-time dynamics of a stochastic differential equation. Despite their strong empirical performance, classical samplers initialized from a Gaussian distribution require a long time horizon noising typically inducing a large number of discretization steps and high computational cost. In this work, we present a Kullback-Leibler convergence analysis of Variance Exploding diffusion samplers that highlights the critical role of the backward process initialization. Based on this result, we propose a theoretically grounded sampling strategy that learns the reverse-time initialization, directly minimizing the initialization error. The resulting procedure is independent of the specific score training procedure, network architecture, and discretization scheme. Experiments on toy distributions and benchmark datasets demonstrate competitive or improved generative quality while using significantly fewer sampling steps.

Initialization-Aware Score-Based Diffusion Sampling

TL;DR

This work presents a Kullback-Leibler convergence analysis of Variance Exploding diffusion samplers and proposes a theoretically grounded sampling strategy that learns the reverse-time initialization, directly minimizing the initialization error.

Abstract

Score-based generative models (SGMs) aim at generating samples from a target distribution by approximating the reverse-time dynamics of a stochastic differential equation. Despite their strong empirical performance, classical samplers initialized from a Gaussian distribution require a long time horizon noising typically inducing a large number of discretization steps and high computational cost. In this work, we present a Kullback-Leibler convergence analysis of Variance Exploding diffusion samplers that highlights the critical role of the backward process initialization. Based on this result, we propose a theoretically grounded sampling strategy that learns the reverse-time initialization, directly minimizing the initialization error. The resulting procedure is independent of the specific score training procedure, network architecture, and discretization scheme. Experiments on toy distributions and benchmark datasets demonstrate competitive or improved generative quality while using significantly fewer sampling steps.
Paper Structure (42 sections, 12 theorems, 93 equations, 9 figures, 16 tables, 3 algorithms)

This paper contains 42 sections, 12 theorems, 93 equations, 9 figures, 16 tables, 3 algorithms.

Key Result

Theorem 3.1

Assume that Hassum:moment-assum:novikov hold. Then, for all $\delta >0$, with where $\mathcal{E}_{\theta}(t_k,\overleftarrow{X}_{t_k})$ and $\mathcal{E}(t_k,t,\overleftarrow{X}_{t_k},\overleftarrow{X}_{t})$ are defined in eq:def:score:err. In addition, the term $\mathsf{E}_{\operatorname{disc}}$ is upper bounded by Assume in addition that Hassum:train holds. Then,

Figures (9)

  • Figure 1: Comparison of sampling trajectories. Traditional SGMs sample across the full horizon $T$ from a Gaussian, while our approach models the intermediate noise distribution, enabling short-horizon sampling that preserves generative quality and reduces computation.
  • Figure 2: Quantile plot (0.1--0.999999) of mean and std over $d=100$ dimensions for heavy-tailed distributions: $p_\infty$ (x), $p_T$ (), $p_\theta$ (), real data (). Respectively $\sigma_T = \{2,7,80\}$. On the x-axis the quantile levels, on the y-axis the quantile values.
  • Figure 3: ImageNet$_{\text{birds}}$ representative nearest neighbor samples per label.
  • Figure 4: FFHQ representative nearest neighbor samples.
  • Figure 5: Illustration of different choices of the discretization points for the GMM case.
  • ...and 4 more figures

Theorems & Definitions (22)

  • Theorem 3.1
  • Theorem 3.4: Adaptation of tang2021empirical[Theorem 3]
  • Theorem 1.1
  • proof
  • Lemma 1.2
  • proof
  • Lemma 1.3
  • proof
  • Lemma 1.4: Fokker–Planck Equation
  • proof
  • ...and 12 more