Target Score Matching

Valentin De Bortoli; Michael Hutchinson; Peter Wirnsberger; Arnaud Doucet

Target Score Matching

Valentin De Bortoli, Michael Hutchinson, Peter Wirnsberger, Arnaud Doucet

TL;DR

This work tackles the poor low-noise performance of denoising score matching by introducing Target Score Identity (TSI), which uses the known score of the clean target $\nabla \log p_X(x)$ to derive Target Score Matching (TSM) losses that exhibit favorable low-noise behavior. For additive noise, TSI yields $\nabla \log p_Y(y)=\int \nabla \log p_X(x)\, p_{X|Y}(x|y)\, \mathrm{d}x$, while DSM relies on $\nabla \log p_{Y|X}(y|x)$; the paper proves a relationship $\ell_{\text{TSM}}(\theta)=\ell_{\text{DSM}}(\theta)+$ constants and shows substantial variance reductions in practice. The authors extend these ideas to non-additive noise, Lie groups, and Bridge Matching, deriving corresponding score identities and practical estimators that can be implemented via MCMC or regression targets. Experiments on analytic targets and trained models demonstrate reduced estimator variance at low noise and faster convergence, validating the approach for physics-informed and manifold-valued applications. Overall, target-informed score matching provides a principled pathway to more stable, low-noise score estimation in diffusion-like models with broad applicability.

Abstract

Denoising Score Matching estimates the score of a noised version of a target distribution by minimizing a regression loss and is widely used to train the popular class of Denoising Diffusion Models. A well known limitation of Denoising Score Matching, however, is that it yields poor estimates of the score at low noise levels. This issue is particularly unfavourable for problems in the physical sciences and for Monte Carlo sampling tasks for which the score of the clean original target is known. Intuitively, estimating the score of a slightly noised version of the target should be a simple task in such cases. In this paper, we address this shortcoming and show that it is indeed possible to leverage knowledge of the target score. We present a Target Score Identity and corresponding Target Score Matching regression loss which allows us to obtain score estimates admitting favourable properties at low noise levels.

Target Score Matching

TL;DR

This work tackles the poor low-noise performance of denoising score matching by introducing Target Score Identity (TSI), which uses the known score of the clean target

to derive Target Score Matching (TSM) losses that exhibit favorable low-noise behavior. For additive noise, TSI yields

, while DSM relies on

; the paper proves a relationship

constants and shows substantial variance reductions in practice. The authors extend these ideas to non-additive noise, Lie groups, and Bridge Matching, deriving corresponding score identities and practical estimators that can be implemented via MCMC or regression targets. Experiments on analytic targets and trained models demonstrate reduced estimator variance at low noise and faster convergence, validating the approach for physics-informed and manifold-valued applications. Overall, target-informed score matching provides a principled pathway to more stable, low-noise score estimation in diffusion-like models with broad applicability.

Abstract

Paper Structure (22 sections, 8 theorems, 74 equations, 5 figures)

This paper contains 22 sections, 8 theorems, 74 equations, 5 figures.

Introduction and Motivation
Denoising Score Identity and Denoising Score Matching
Limitations
Target Score Identity and Target Score Matching
Extensions
Extension to non-Additive Noise
Extension to Lie groups
Extension to Bridge Matching
Experiments
Analytic estimators
Trained score models
Proofs of the Main Results
Proof of Proposition \ref{['prop:ti_identity']}
First proof.
Second proof.
...and 7 more sections

Key Result

Proposition 2.1

For the additive noise model eq:additive, the following Target Score Identity holds

Figures (5)

Figure 1: Target distributions (top panel), and the mixture weights $\kappa_t$ and $\bar{\kappa}_t$ through time induced by these targets (bottom panel).
Figure 2: The estimated variance of each estimator based the score identities. Computed using 10,000 samples of the estimator. For each estimator sample, $X_t$ is sampled from $p_t$. For each $X_t$, we use 100 samples from $p_{0|t}$ to estimate the score.
Figure 3: The different weighting functions across time, for a $\sigma_\text{data}^2 = 1$
Figure 4: Comparison of the distribution of training losses for the combinations of the 4 target densities, 4 training losses, and 4 weighting functions.
Figure 5: (Left) Mean of the regression loss $\| s_\theta(t, X_t) - L \|^2$ with $\| s_\theta(t, X_t) - L \|^2$ with $L \in \{L_\mathrm{DSM}, L_\mathrm{TSM}, L_{\kappa_t}, L_{\bar{\kappa}_t} \}$ across training iterations. (Right) MMD distance between the empirical data distribution and generated samples with score $s_\theta$ for a RBF kernel.

Theorems & Definitions (8)

Proposition 2.1
Corollary 2.2
Proposition 2.3
Corollary 2.4
Proposition 3.1
Proposition 3.2
Proposition 3.3
Proposition 3.4

Target Score Matching

TL;DR

Abstract

Target Score Matching

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (8)