Initialization-Aware Score-Based Diffusion Sampling

Tiziano Fassina; Gabriel Cardoso; Sylvan Le Corff; Thomas Romary

Initialization-Aware Score-Based Diffusion Sampling

Tiziano Fassina, Gabriel Cardoso, Sylvan Le Corff, Thomas Romary

TL;DR

This work presents a Kullback-Leibler convergence analysis of Variance Exploding diffusion samplers and proposes a theoretically grounded sampling strategy that learns the reverse-time initialization, directly minimizing the initialization error.

Abstract

Score-based generative models (SGMs) aim at generating samples from a target distribution by approximating the reverse-time dynamics of a stochastic differential equation. Despite their strong empirical performance, classical samplers initialized from a Gaussian distribution require a long time horizon noising typically inducing a large number of discretization steps and high computational cost. In this work, we present a Kullback-Leibler convergence analysis of Variance Exploding diffusion samplers that highlights the critical role of the backward process initialization. Based on this result, we propose a theoretically grounded sampling strategy that learns the reverse-time initialization, directly minimizing the initialization error. The resulting procedure is independent of the specific score training procedure, network architecture, and discretization scheme. Experiments on toy distributions and benchmark datasets demonstrate competitive or improved generative quality while using significantly fewer sampling steps.

Initialization-Aware Score-Based Diffusion Sampling

TL;DR

Abstract

Paper Structure (42 sections, 12 theorems, 93 equations, 9 figures, 16 tables, 3 algorithms)

This paper contains 42 sections, 12 theorems, 93 equations, 9 figures, 16 tables, 3 algorithms.

Introduction
Related Works
Training intermediate distributions.
SGM Guarantees.
Complementary Methods.
KL Control & Short Time Diffusion
Context
Forward process.
Backward process.
Score approximation and discretization.
Initialization of the backward sampler.
Hypothesis & Main Result
KL-driven initialization learning
Experiments
Gaussian Mixture Model.
...and 27 more sections

Key Result

Theorem 3.1

Assume that Hassum:moment-assum:novikov hold. Then, for all $\delta >0$, with where $\mathcal{E}_{\theta}(t_k,\overleftarrow{X}_{t_k})$ and $\mathcal{E}(t_k,t,\overleftarrow{X}_{t_k},\overleftarrow{X}_{t})$ are defined in eq:def:score:err. In addition, the term $\mathsf{E}_{\operatorname{disc}}$ is upper bounded by Assume in addition that Hassum:train holds. Then,

Figures (9)

Figure 1: Comparison of sampling trajectories. Traditional SGMs sample across the full horizon $T$ from a Gaussian, while our approach models the intermediate noise distribution, enabling short-horizon sampling that preserves generative quality and reduces computation.
Figure 2: Quantile plot (0.1--0.999999) of mean and std over $d=100$ dimensions for heavy-tailed distributions: $p_\infty$ (x), $p_T$ (), $p_\theta$ (), real data (). Respectively $\sigma_T = \{2,7,80\}$. On the x-axis the quantile levels, on the y-axis the quantile values.
Figure 3: ImageNet$_{\text{birds}}$ representative nearest neighbor samples per label.
Figure 4: FFHQ representative nearest neighbor samples.
Figure 5: Illustration of different choices of the discretization points for the GMM case.
...and 4 more figures

Theorems & Definitions (22)

Theorem 3.1
Theorem 3.4: Adaptation of tang2021empirical[Theorem 3]
Theorem 1.1
proof
Lemma 1.2
proof
Lemma 1.3
proof
Lemma 1.4: Fokker–Planck Equation
proof
...and 12 more

Initialization-Aware Score-Based Diffusion Sampling

TL;DR

Abstract

Initialization-Aware Score-Based Diffusion Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (22)