Table of Contents
Fetching ...

Local Curvature Smoothing with Stein's Identity for Efficient Score Matching

Genki Osada, Makoto Shing, Takashi Nishide

TL;DR

This work targets the computational bottleneck of score-based diffusion models by addressing the Jacobian-trace term in score matching. The authors introduce Local Curvature Smoothing with Stein's identity (LCSS), which recasts the trace into an efficiently computable inner product via a Gaussian-averaged objective and Stein's identity, enabling regularization without enforcing affine SDEs. The time-conditioned LCSS objective integrates naturally into SDMs, supporting flexible forward processes and yielding faster training, stable convergence, and competitive or superior sample quality, including high-resolution generation up to $1024\times1024$. Empirically, LCSS outperforms SSM and FD-SSM in density estimation and training efficiency and matches or surpasses DSM in several qualitative and quantitative metrics, while avoiding DSM's affine-SDE constraint and associated instabilities. The method broadens the design space for SDMs by decoupling score matching from affine forward dynamics, with strong practical implications for scalable, high-fidelity image generation.

Abstract

The training of score-based diffusion models (SDMs) is based on score matching. The challenge of score matching is that it includes a computationally expensive Jacobian trace. While several methods have been proposed to avoid this computation, each has drawbacks, such as instability during training and approximating the learning as learning a denoising vector field rather than a true score. We propose a novel score matching variant, local curvature smoothing with Stein's identity (LCSS). The LCSS bypasses the Jacobian trace by applying Stein's identity, enabling regularization effectiveness and efficient computation. We show that LCSS surpasses existing methods in sample generation performance and matches the performance of denoising score matching, widely adopted by most SDMs, in evaluations such as FID, Inception score, and bits per dimension. Furthermore, we show that LCSS enables realistic image generation even at a high resolution of $1024 \times 1024$.

Local Curvature Smoothing with Stein's Identity for Efficient Score Matching

TL;DR

This work targets the computational bottleneck of score-based diffusion models by addressing the Jacobian-trace term in score matching. The authors introduce Local Curvature Smoothing with Stein's identity (LCSS), which recasts the trace into an efficiently computable inner product via a Gaussian-averaged objective and Stein's identity, enabling regularization without enforcing affine SDEs. The time-conditioned LCSS objective integrates naturally into SDMs, supporting flexible forward processes and yielding faster training, stable convergence, and competitive or superior sample quality, including high-resolution generation up to . Empirically, LCSS outperforms SSM and FD-SSM in density estimation and training efficiency and matches or surpasses DSM in several qualitative and quantitative metrics, while avoiding DSM's affine-SDE constraint and associated instabilities. The method broadens the design space for SDMs by decoupling score matching from affine forward dynamics, with strong practical implications for scalable, high-fidelity image generation.

Abstract

The training of score-based diffusion models (SDMs) is based on score matching. The challenge of score matching is that it includes a computationally expensive Jacobian trace. While several methods have been proposed to avoid this computation, each has drawbacks, such as instability during training and approximating the learning as learning a denoising vector field rather than a true score. We propose a novel score matching variant, local curvature smoothing with Stein's identity (LCSS). The LCSS bypasses the Jacobian trace by applying Stein's identity, enabling regularization effectiveness and efficient computation. We show that LCSS surpasses existing methods in sample generation performance and matches the performance of denoising score matching, widely adopted by most SDMs, in evaluations such as FID, Inception score, and bits per dimension. Furthermore, we show that LCSS enables realistic image generation even at a high resolution of .

Paper Structure

This paper contains 34 sections, 4 theorems, 25 equations, 10 figures, 4 tables.

Key Result

Lemma 1

Score matching with local curvature smoothing (Definition def:sm_lcs) is equivalent to the expectation of $\mathcal{J}_\text{SM}^{s} (\theta, {\bf x})$ over a Gaussian distribution centered at ${\bf x}$, i.e., ${\bf x}' \sim \mathcal{N}({\bf x}, \sigma^{2} \mathbb{I}_{\text{d}})$: where $\epsilon := \left\lVert{\bf x}' - {\bf x}\right\rVert_{2}$.

Figures (10)

  • Figure 1: Samples generated from models trained on CelebA-HQ ($1024 \times 1024$) using our proposed score matching method, LCSS. The rightmost images in each row are generated by DDPM++ with subVP SDE, while the rest are by NCSN++ with VE SDE.
  • Figure 2: Comparison of sample quality in the early stages of training. The model is NCSNv2 trained on CIFAR-10. The left three panels show generated samples at 5k steps training, while the right three show generated samples at 90k steps training.
  • Figure 3: Comparison of generated samples on CelebA $(64 \times 64)$. The left three show samples from models trained for 10k steps. In the right three, FD-SSM and LCSS images are from models trained for 210k steps, whereas SSM images are from a model trained for 60k steps.
  • Figure 4: Samples on FFHQ $(256 \times 256)$. Models are trained for 600k steps with batch size 16. SSM and FD-SSM fail to produce face images.
  • Figure 5: Samples on AFHQ $(256 \times 256)$. Models are trained in the same setting as those on FFHQ.
  • ...and 5 more figures

Theorems & Definitions (6)

  • Definition 1: Score matching with local curvature smoothing NIPS2010_6f3e29a3
  • Lemma 1: NIPS2010_6f3e29a3
  • Definition 2: Stein class stein1972bound
  • Lemma 2: Stein's identity, pmlr-v48-liub16NIPS2015_698d51a1
  • Corollary 1: li2018gradient
  • Corollary 2: Bypassing Jacobian trace computation