Table of Contents
Fetching ...

An analysis of the noise schedule for score-based generative models

Stanislas Strasman, Antonio Ocello, Claire Boyer, Sylvain Le Corff, Vincent Lemaire

TL;DR

The paper tackles the problem of how time-inhomogeneous noise schedules affect score-based diffusion models (SGMs). It introduces a unified forward-backward diffusion framework with a parametric noise schedule and derives a non-asymptotic bound on the KL divergence that explicitly depends on the schedule, plus a refined Wasserstein bound under Lipschitz and strong-log-concavity assumptions. The authors show that incorporating backward contraction improves mixing-time errors and provide numerical experiments on Gaussian targets and CIFAR-10 to guide schedule design, including a parametric schedule that often outperforms standard linear or cosine schedules. They also extend the analysis to non-Gaussian targets via Wasserstein-based metrics and demonstrate practical data preprocessing strategies to tighten the bounds and enhance generation quality. Overall, the work offers theoretical and empirical guidance for selecting and tuning noise schedules to improve SGMs in both simple and complex data settings, with publicly available code for reproducibility.

Abstract

Score-based generative models (SGMs) aim at estimating a target data distribution by learning score functions using only noise-perturbed samples from the target.Recent literature has focused extensively on assessing the error between the target and estimated distributions, gauging the generative quality through the Kullback-Leibler (KL) divergence and Wasserstein distances. Under mild assumptions on the data distribution, we establish an upper bound for the KL divergence between the target and the estimated distributions, explicitly depending on any time-dependent noise schedule. Under additional regularity assumptions, taking advantage of favorable underlying contraction mechanisms, we provide a tighter error bound in Wasserstein distance compared to state-of-the-art results. In addition to being tractable, this upper bound jointly incorporates properties of the target distribution and SGM hyperparameters that need to be tuned during training. Finally, we illustrate these bounds through numerical experiments using simulated and CIFAR-10 datasets, identifying an optimal range of noise schedules within a parametric family.

An analysis of the noise schedule for score-based generative models

TL;DR

The paper tackles the problem of how time-inhomogeneous noise schedules affect score-based diffusion models (SGMs). It introduces a unified forward-backward diffusion framework with a parametric noise schedule and derives a non-asymptotic bound on the KL divergence that explicitly depends on the schedule, plus a refined Wasserstein bound under Lipschitz and strong-log-concavity assumptions. The authors show that incorporating backward contraction improves mixing-time errors and provide numerical experiments on Gaussian targets and CIFAR-10 to guide schedule design, including a parametric schedule that often outperforms standard linear or cosine schedules. They also extend the analysis to non-Gaussian targets via Wasserstein-based metrics and demonstrate practical data preprocessing strategies to tighten the bounds and enhance generation quality. Overall, the work offers theoretical and empirical guidance for selecting and tuning noise schedules to improve SGMs in both simple and complex data settings, with publicly available code for reproducibility.

Abstract

Score-based generative models (SGMs) aim at estimating a target data distribution by learning score functions using only noise-perturbed samples from the target.Recent literature has focused extensively on assessing the error between the target and estimated distributions, gauging the generative quality through the Kullback-Leibler (KL) divergence and Wasserstein distances. Under mild assumptions on the data distribution, we establish an upper bound for the KL divergence between the target and the estimated distributions, explicitly depending on any time-dependent noise schedule. Under additional regularity assumptions, taking advantage of favorable underlying contraction mechanisms, we provide a tighter error bound in Wasserstein distance compared to state-of-the-art results. In addition to being tractable, this upper bound jointly incorporates properties of the target distribution and SGM hyperparameters that need to be tuned during training. Finally, we illustrate these bounds through numerical experiments using simulated and CIFAR-10 datasets, identifying an optimal range of noise schedules within a parametric family.
Paper Structure (62 sections, 27 theorems, 221 equations, 21 figures, 3 tables)

This paper contains 62 sections, 27 theorems, 221 equations, 21 figures, 3 tables.

Key Result

Theorem 3.1

Assume that Hhyp:sched, Hhyp:fisher_info and Hhyp:novikov hold. Then, where with $h := \sup_{k \in \{1 , \ldots , N \} } (t_{k}-t_{k-1})$ small enough and $t_0:= 0$.

Figures (21)

  • Figure 1: Noise schedule $\beta_a$ over time for $a \in \{-10, -9,..,10 \}$ with the linear schedule $a=0$ shown as a dashed line.
  • Figure 2: Comparison of the empirical KL divergence (top) and $\mathcal{W}_2$ distance (bottom) (mean ± std over 10 runs) between $\pi_{\mathrm{data}}$ and $\widehat{\pi}_N^{(\beta,\theta)}$ (orange) and the related upper bounds (blue) from Theorem \ref{['th:main']} and Theorem \ref{['thm:wasserstein_bound']} across parameter $a$ for noise schedule $\beta_a$, $d=50$. We also show the metrics for the linear VPSDE model (dashed line) and our model (dotted line) with exact score evaluation.
  • Figure 3: Comparison of the empirical $\mathcal{W}_2$ distance (mean value $\pm$ std over 10 runs) between $\pi_{\mathrm{data}}$ and the generative distribution $\widehat{\pi}_N^{(\beta, \theta)}$ across various dimensions. The distributions compared include SGMs with different noise schedules: $\beta_{a^\star}$ (blue solid), $\beta_0$ (yellow dashed), and $\beta_{\cos}$ (orange dotted).
  • Figure 4: Upper bound and sliced 2-Wasserstein distance on a Funnel dataset in dimension 50.
  • Figure 5: FID Scores using 50,000 generated samples for the parametric and cosine schedules (CIFAR-10 dataset).
  • ...and 16 more figures

Theorems & Definitions (48)

  • Theorem 3.1
  • Lemma 4.1
  • Theorem 4.2
  • Corollary 4.3
  • Lemma B.1
  • proof
  • Lemma B.2
  • proof
  • Lemma B.3
  • proof
  • ...and 38 more