Operator-Informed Score Matching for Markov Diffusion Models

Zheyang Shen; Huihui Wang; Marina Riabiz; Chris J. Oates

Operator-Informed Score Matching for Markov Diffusion Models

Zheyang Shen, Huihui Wang, Marina Riabiz, Chris J. Oates

TL;DR

The paper argues that Markov diffusion operators offer theoretical and practical advantages for score-based generative modeling by exploiting their explicit spectral structure. It introduces operator-informed score matching (oism), which expresses the score across all noise levels as a linear combination of eigenfunctions of the infinitesimal generator $\mathscr{L}$ and uses $P_t\phi_n = e^{\lambda_n t}\phi_n$ to compute scores without forward-simulation. This yields a quadratic form in eigen-coefficients with a closed-form minimizer $\hat{\boldsymbol{\alpha}} = \mathbf{A}_t^{-1} \mathbf{b}_t$, producing a score estimator $\tilde{\mathbf{s}}_t(x)$. Empirically, oism accelerates training on low-dimensional tasks and complements neural estimators in high dimensions via a residual approach, though higher-order eigenfunctions do not always improve performance. The work highlights a practical path to integrating spectral diffusion-operator insights with standard diffusion models for faster, potentially more data-efficient generative modeling.

Abstract

Diffusion models are typically trained using score matching, a learning objective agnostic to the underlying noising process that guides the model. This paper argues that Markov noising processes enjoy an advantage over alternatives, as the Markov operators that govern the noising process are well-understood. Specifically, by leveraging the spectral decomposition of the infinitesimal generator of the Markov noising process, we obtain parametric estimates of the score functions simultaneously for all marginal distributions, using only sample averages with respect to the data distribution. The resulting operator-informed score matching provides both a standalone approach to sample generation for low-dimensional distributions, as well as a recipe for better informed neural score estimators in high-dimensional settings.

Operator-Informed Score Matching for Markov Diffusion Models

TL;DR

and uses

to compute scores without forward-simulation. This yields a quadratic form in eigen-coefficients with a closed-form minimizer

, producing a score estimator

. Empirically, oism accelerates training on low-dimensional tasks and complements neural estimators in high dimensions via a residual approach, though higher-order eigenfunctions do not always improve performance. The work highlights a practical path to integrating spectral diffusion-operator insights with standard diffusion models for faster, potentially more data-efficient generative modeling.

Abstract

Paper Structure (48 sections, 65 equations, 5 figures, 1 table)

This paper contains 48 sections, 65 equations, 5 figures, 1 table.

Set-up and Notation
Diffusion Models through the Lens of Markov Diffusion Operators
Markov Diffusion Operators and their Spectral Decompositions
Score Matching via Generalized Integration-by-Parts
Operator-Informed Score Matching
Score Matching as a Quadratic Form
Regularised Estimation of Eigenfunctions via Shrinkage
An Illustrative Experiment
Practical oism in High Dimensions
Experimental Assessment
Implementation detail
Results
Interpretation
Discussion
Related work
...and 33 more sections

Figures (5)

Figure 1: Illustration of oism in 1D. The first column contains the ground truth density and score functions $\nabla \log \rho_t$ at $t=0$ and $t=0.02$. OISM I is based on exact $\mathbb{E}_{\rho_0}[\phi_k]$, highlighting the capacity for an eigenfuntion basis to provide high-quality score estimates across different levels of noise perturbation, which leads to accurate density estimates via the probability flow ode. OISM II instead uses data-based shrinkage estimates for $\mathbb{E}_{\rho_0}[\phi_k]$; the density remains accurately estimated. As a baseline, the fourth column showcases the score and density estimates obtained via DDPM, a neural network trained via denoising score matching (SM). [In the second and third rows the score matching loss in displayed in the subtitles.] The final column (top) shows the evolution of the SM loss for DDPM (red) and OISM II (blue); and (bottom) the SM loss for OISM II as a function of the number of eigenfunctions, corresponding to sample means (red) and shrinkage estimators (blue). [Error bars denote the standard error over 50 simulations.] The Stein effect is clearly visible, as we observe little difference between sample mean and shrinkage estimators when the dimensionality of eigenfunction is low, however, the performance gap becomes clearly visible as the number of eigenfunction increases.
Figure 2: Illustration of oism in 2D. The leftmost plot showcases a subset of training data. oism was applied to estimate the score function, using 200 eigenfunctions in total. The density recovered by inverting the probability flow ode \ref{['eq: prob flow ode']} is quite accurate, which we attribute to the periodicity of the eigenfunctions of the bm forward process.
Figure 3: Performance evaluation on CIFAR-10. Top left: Smoothed score matching (SM) loss for DDPMsongMaximumLikelihoodTraining2021, OISM (3), and OISM (6). Bottom left: Negative log-likelihood evaluated on the test dataset. Right: Samples corresponding to different training iterations, obtained by reversing the probability flow ODE \ref{['eq: prob flow ode']}.
Figure 4: Additional illustrations on toy dataset. The first column shows (a subset of) ground truth samples from the data distribution; The second column visualizes the oism density obtained via probability flow ode; The third column plots sample points simulated via probability flow ode; The last column plots samples generated by a reverse sde sampler.
Figure 5: Coefficients $\widehat{\boldsymbol{\alpha}}$ from OISM (6) as a function of the normalized time $\tau$. Left: exact solution on the grid of points at which the linear systems were solved during training. Right: linearly interpolated solutions obtained on a grid of 1,000 evenly-spaced time points. Simple linear interpolation is sufficient for training and evaluation of the neural network, due to the temporal correlation.

Theorems & Definitions (8)

Example 1: bm process
Example 2: ou process
Example 3: bm process, continued
Example 4: ou process, continued
Example 5: ou and bm processes, continued
Example 6: OU process, continued
Definition 1: Symmetric Markov semigroup
Example 7: Truncated bm

Operator-Informed Score Matching for Markov Diffusion Models

TL;DR

Abstract

Operator-Informed Score Matching for Markov Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (8)