What's the score? Automated Denoising Score Matching for Nonlinear Diffusions

Raghav Singhal; Mark Goldstein; Rajesh Ranganath

What's the score? Automated Denoising Score Matching for Nonlinear Diffusions

Raghav Singhal, Mark Goldstein, Rajesh Ranganath

TL;DR

This paper tackles the challenge of training score-based diffusion models when the forward transition is nonlinear and intractable by introducing Local Denoising Score Matching (local-DSM). Local-DSM builds the training objective from local increments of the transition kernel $q(\mathbf{y}_t|\mathbf{y}_s)$ and uses local linearization to yield Gaussian transitions, enabling automated score estimation without explicit solutions for nonlinear $q(\mathbf{y}_t|\mathbf{y}_0)$. It derives mean and covariance equations via matrix exponentials, proposes error-control schedules for the Taylor approximation, and provides algorithms plus extensions (perceptual weighting, score modeling) with theoretical error bounds. Empirically, Local-DSM achieves faster training and improved sample quality on low-dimensional and CIFAR-10 tasks with non-Gaussian priors, and it enables score estimation for nonlinear processes in physics and related sciences, broadening the practical reach of diffusion-based modeling.

Abstract

Reversing a diffusion process by learning its score forms the heart of diffusion-based generative modeling and for estimating properties of scientific systems. The diffusion processes that are tractable center on linear processes with a Gaussian stationary distribution. This limits the kinds of models that can be built to those that target a Gaussian prior or more generally limits the kinds of problems that can be generically solved to those that have conditionally linear score functions. In this work, we introduce a family of tractable denoising score matching objectives, called local-DSM, built using local increments of the diffusion process. We show how local-DSM melded with Taylor expansions enables automated training and score estimation with nonlinear diffusion processes. To demonstrate these ideas, we use automated-DSM to train generative models using non-Gaussian priors on challenging low dimensional distributions and the CIFAR10 image dataset. Additionally, we use the automated-DSM to learn the scores for nonlinear processes studied in statistical physics.

What's the score? Automated Denoising Score Matching for Nonlinear Diffusions

TL;DR

and uses local linearization to yield Gaussian transitions, enabling automated score estimation without explicit solutions for nonlinear

. It derives mean and covariance equations via matrix exponentials, proposes error-control schedules for the Taylor approximation, and provides algorithms plus extensions (perceptual weighting, score modeling) with theoretical error bounds. Empirically, Local-DSM achieves faster training and improved sample quality on low-dimensional and CIFAR-10 tasks with non-Gaussian priors, and it enables score estimation for nonlinear processes in physics and related sciences, broadening the practical reach of diffusion-based modeling.

Abstract

Paper Structure (41 sections, 6 theorems, 106 equations, 10 figures, 2 tables, 2 algorithms)

This paper contains 41 sections, 6 theorems, 106 equations, 10 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Implicit nonlinear Diffusions.
Stochastic Interpolants.
Background and Setup
Denoising Score Matching.
Assumptions.
Automated DSM training for nonlinear diffusions
Local DSM.
Local Linearization.
Mean and Covariance Equations.
Mean and Covariance Estimation.
Algorithms.
Controlling the Taylor Error with Scheduled Pairs.
Bounds on the error from Taylor expansion.
...and 26 more sections

Key Result

Lemma 1

Let $q(\mathbf{y}_s ~\vert~ x), q(\mathbf{y}_t ~\vert~ \mathbf{y}_s)$ be the transition kernels of the process defined in eq:generic_sde. For any $0 \leq s < t < T$, we have: where $q(\mathbf{y}_t, \mathbf{y}_s ~\vert~ x) = q(\mathbf{y}_t ~\vert~ \mathbf{y}_s)q(\mathbf{y}_s ~\vert~ x)$.

Figures (10)

Figure 1: Training with Automated : Given a nonlinear inference process $q$ and a time $t$ with sample $\mathbf{y}_0 = x$, we use a numerical sampler till time $s(t)$ and then use the locally linear process for sampling $\mathbf{y}_t ~\vert~ \mathbf{y}_s$ and estimating the transition score.
Figure 2: Local mean $\mathbf{m}_{t|s}$ Estimation Error: we compare the estimation error when using the schedule $s_{\lambda}(t)$ with versus $s(t) = t - 0.05$. We note that using $s(t)$ instead of $s_\lambda(t)$ leads to higher error.
Figure 3: v local-: Samples from a local- trained model in the middle panel, and samples from an trained model on the right panel. Both models were trained for $20$k gradient steps, however the local- trained model has better sample quality.
Figure 4: vs Local . Samples from trained with and local- objective using models with a MoG prior. We observe that the local- trained model samples are significantly better than the trained model samples.
Figure 5: Samples at $t \in \{1, 3, 5\}$. Here we compare samples from the process defined in \ref{['eq:active_swimmer']} on the left panel, and local- and trained model samples in the middle and right panels. The inference process and local- trained model samples are near identical. $\mathbf{\textcolor{green}{\star}}$ We note that trained model samples quality did not match the inference process' samples and diverged, see \ref{['fig:active_swimmer_ism_only']} for model samples.
...and 5 more figures

Theorems & Definitions (9)

Lemma 1
Theorem 1
Lemma
proof
Theorem
proof
Lemma 2
Proposition 1: Lemma 2.21 in albergo2023stochastic
proof

What's the score? Automated Denoising Score Matching for Nonlinear Diffusions

TL;DR

Abstract

What's the score? Automated Denoising Score Matching for Nonlinear Diffusions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (9)