The Score-Difference Flow for Implicit Generative Modeling

Romann M. Weber

The Score-Difference Flow for Implicit Generative Modeling

Romann M. Weber

TL;DR

The paper introduces Score-Difference Flow (SD flow) as the optimal deterministic trajectory for aligning a source distribution $q$ with a target distribution $p$ by following the score difference $\nabla\log p - \nabla\log q$. It derives this flow from probability-flow dynamics and stochastic differential equations, showing that small perturbations along the SD direction minimize the KL divergence $\mathbb{D}_{\mathrm{KL}}(q||p)$ and relate to the Fisher divergence. To enable practical use, it replaces intractable $p$ and $q$ with noise- corrupted proxy distributions, and proves that aligning the proxies suffices to align the originals; it also presents a denoiser-based and kernel-based formulation, connecting SD flow to denoising diffusion models and kernel-based methods. The work further reveals that SD flow naturally emerges in GAN training under certain loss formulations, offering a unified view that links diffusion models and GANs. Comprehensive particle- and model-optimization algorithms demonstrate robustness on low-dimensional toy data, arguing that SD flow can address high sample quality, mode coverage, and fast sampling without restricting priors, paving the way for unified, efficient generative modeling approaches.

Abstract

Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we present the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that individually address the three challenges of the "generative modeling trilemma" -- high sample quality, mode coverage, and fast sampling -- thereby setting the stage for a unified approach.

The Score-Difference Flow for Implicit Generative Modeling

TL;DR

The paper introduces Score-Difference Flow (SD flow) as the optimal deterministic trajectory for aligning a source distribution

with a target distribution

by following the score difference

. It derives this flow from probability-flow dynamics and stochastic differential equations, showing that small perturbations along the SD direction minimize the KL divergence

and relate to the Fisher divergence. To enable practical use, it replaces intractable

and

with noise- corrupted proxy distributions, and proves that aligning the proxies suffices to align the originals; it also presents a denoiser-based and kernel-based formulation, connecting SD flow to denoising diffusion models and kernel-based methods. The work further reveals that SD flow naturally emerges in GAN training under certain loss formulations, offering a unified view that links diffusion models and GANs. Comprehensive particle- and model-optimization algorithms demonstrate robustness on low-dimensional toy data, arguing that SD flow can address high sample quality, mode coverage, and fast sampling without restricting priors, paving the way for unified, efficient generative modeling approaches.

Abstract

Paper Structure (33 sections, 47 equations, 5 figures, 4 tables, 2 algorithms)

This paper contains 33 sections, 47 equations, 5 figures, 4 tables, 2 algorithms.

Introduction
Probability Flow and the Score Difference
Derivation from Stochastic Differential Equations
SD Flow Optimally Reduces KL Divergence
Applying SD Flow to Proxy Distributions
Aligning Proxy Distributions
Limitations and Alternative Formulations of SD Flow
Relation to Denoising Diffusion Models
Implicit Flows in Generative Adversarial Networks
Decomposing Generator Training into Sub-problems
SD Flow in GANs
Algorithms
Particle Optimization
Model Optimization
Experiments
...and 18 more sections

Figures (5)

Figure 1: Evolution of synthetic data points from an offset base distribution toward the target distribution of 25 Gaussians over 1000 steps of SD flow (top row), MMD gradient flow (middle row) and SVGD (bottom row) in the no-AdaGrad, full-data, batched, and annealed condition (corresponding to the second row of Table \ref{['tab:Conv25GG']}). Only SD flow converged in this condition.
Figure 2: Evolution of synthetic data points from an offset base distribution toward the target distribution of 30 Gaussians over 1000 steps of SD flow (top row), MMD gradient flow (middle row) and SVGD (bottom row) in the no-AdaGrad, full-data, cosine-noise annealed condition (corresponding to the fourth row of Table \ref{['tab:ConvQM']}). Only SD flow converged in this condition.
Figure 3: Top: Data-set interpolation via evolution of 1024 points from the "Swiss roll" distribution to the "mystery" distribution in $\mathbb{R}^3$. Bottom: The reverse interpolation, from the "mystery" distribution to the "Swiss roll" distribution.
Figure 4: Distribution of distances from synthetic (blue) and target (red) data points to their first nearest neighbors in the target distribution.
Figure 5: Model optimization results in $\mathbb{R}^{50}$ using a constant noise schedule. SD flow allows a parametric model to be learned that very closely matches the target mean (${\bm{\mu}}$ versus $\hat{{\bm{\mu}}}$, left panel) and the elements of the covariance matrix (${\bm{B}} {\bm{B}}^\top$ vs $\hat{{\bm{B}}} \hat{{\bm{B}}}^\top$, center panel). Diagonals are included for reference. Nearest-neighbor analysis showed no overfitting of the data (right panel) but showed a slightly lower average distance to nearest neighbors in the target set than exhibited by the target data relative to itself.

Theorems & Definitions (2)

Remark 1
Remark 2

The Score-Difference Flow for Implicit Generative Modeling

TL;DR

Abstract

The Score-Difference Flow for Implicit Generative Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (2)