Table of Contents
Fetching ...

What's the score? Automated Denoising Score Matching for Nonlinear Diffusions

Raghav Singhal, Mark Goldstein, Rajesh Ranganath

TL;DR

This paper tackles the challenge of training score-based diffusion models when the forward transition is nonlinear and intractable by introducing Local Denoising Score Matching (local-DSM). Local-DSM builds the training objective from local increments of the transition kernel $q(\mathbf{y}_t|\mathbf{y}_s)$ and uses local linearization to yield Gaussian transitions, enabling automated score estimation without explicit solutions for nonlinear $q(\mathbf{y}_t|\mathbf{y}_0)$. It derives mean and covariance equations via matrix exponentials, proposes error-control schedules for the Taylor approximation, and provides algorithms plus extensions (perceptual weighting, score modeling) with theoretical error bounds. Empirically, Local-DSM achieves faster training and improved sample quality on low-dimensional and CIFAR-10 tasks with non-Gaussian priors, and it enables score estimation for nonlinear processes in physics and related sciences, broadening the practical reach of diffusion-based modeling.

Abstract

Reversing a diffusion process by learning its score forms the heart of diffusion-based generative modeling and for estimating properties of scientific systems. The diffusion processes that are tractable center on linear processes with a Gaussian stationary distribution. This limits the kinds of models that can be built to those that target a Gaussian prior or more generally limits the kinds of problems that can be generically solved to those that have conditionally linear score functions. In this work, we introduce a family of tractable denoising score matching objectives, called local-DSM, built using local increments of the diffusion process. We show how local-DSM melded with Taylor expansions enables automated training and score estimation with nonlinear diffusion processes. To demonstrate these ideas, we use automated-DSM to train generative models using non-Gaussian priors on challenging low dimensional distributions and the CIFAR10 image dataset. Additionally, we use the automated-DSM to learn the scores for nonlinear processes studied in statistical physics.

What's the score? Automated Denoising Score Matching for Nonlinear Diffusions

TL;DR

This paper tackles the challenge of training score-based diffusion models when the forward transition is nonlinear and intractable by introducing Local Denoising Score Matching (local-DSM). Local-DSM builds the training objective from local increments of the transition kernel and uses local linearization to yield Gaussian transitions, enabling automated score estimation without explicit solutions for nonlinear . It derives mean and covariance equations via matrix exponentials, proposes error-control schedules for the Taylor approximation, and provides algorithms plus extensions (perceptual weighting, score modeling) with theoretical error bounds. Empirically, Local-DSM achieves faster training and improved sample quality on low-dimensional and CIFAR-10 tasks with non-Gaussian priors, and it enables score estimation for nonlinear processes in physics and related sciences, broadening the practical reach of diffusion-based modeling.

Abstract

Reversing a diffusion process by learning its score forms the heart of diffusion-based generative modeling and for estimating properties of scientific systems. The diffusion processes that are tractable center on linear processes with a Gaussian stationary distribution. This limits the kinds of models that can be built to those that target a Gaussian prior or more generally limits the kinds of problems that can be generically solved to those that have conditionally linear score functions. In this work, we introduce a family of tractable denoising score matching objectives, called local-DSM, built using local increments of the diffusion process. We show how local-DSM melded with Taylor expansions enables automated training and score estimation with nonlinear diffusion processes. To demonstrate these ideas, we use automated-DSM to train generative models using non-Gaussian priors on challenging low dimensional distributions and the CIFAR10 image dataset. Additionally, we use the automated-DSM to learn the scores for nonlinear processes studied in statistical physics.
Paper Structure (41 sections, 6 theorems, 106 equations, 10 figures, 2 tables, 2 algorithms)

This paper contains 41 sections, 6 theorems, 106 equations, 10 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

Let $q(\mathbf{y}_s ~\vert~ x), q(\mathbf{y}_t ~\vert~ \mathbf{y}_s)$ be the transition kernels of the process defined in eq:generic_sde. For any $0 \leq s < t < T$, we have: where $q(\mathbf{y}_t, \mathbf{y}_s ~\vert~ x) = q(\mathbf{y}_t ~\vert~ \mathbf{y}_s)q(\mathbf{y}_s ~\vert~ x)$.

Figures (10)

  • Figure 1: Training with Automated : Given a nonlinear inference process $q$ and a time $t$ with sample $\mathbf{y}_0 = x$, we use a numerical sampler till time $s(t)$ and then use the locally linear process for sampling $\mathbf{y}_t ~\vert~ \mathbf{y}_s$ and estimating the transition score.
  • Figure 2: Local mean $\mathbf{m}_{t|s}$ Estimation Error: we compare the estimation error when using the schedule $s_{\lambda}(t)$ with versus $s(t) = t - 0.05$. We note that using $s(t)$ instead of $s_\lambda(t)$ leads to higher error.
  • Figure 3: v local-: Samples from a local- trained model in the middle panel, and samples from an trained model on the right panel. Both models were trained for $20$k gradient steps, however the local- trained model has better sample quality.
  • Figure 4: vs Local . Samples from trained with and local- objective using models with a MoG prior. We observe that the local- trained model samples are significantly better than the trained model samples.
  • Figure 5: Samples at $t \in \{1, 3, 5\}$. Here we compare samples from the process defined in \ref{['eq:active_swimmer']} on the left panel, and local- and trained model samples in the middle and right panels. The inference process and local- trained model samples are near identical. $\mathbf{\textcolor{green}{\star}}$ We note that trained model samples quality did not match the inference process' samples and diverged, see \ref{['fig:active_swimmer_ism_only']} for model samples.
  • ...and 5 more figures

Theorems & Definitions (9)

  • Lemma 1
  • Theorem 1
  • Lemma
  • proof
  • Theorem
  • proof
  • Lemma 2
  • Proposition 1: Lemma 2.21 in albergo2023stochastic
  • proof