Table of Contents
Fetching ...

Latent Nonlinear Denoising Score Matching for Enhanced Learning of Structured Distributions

Kaichen Shen, Wei Zhu

TL;DR

The paper tackles learning structured distributions with score-based models by marrying nonlinear latent diffusion (NDSM) with latent space modeling (LSGM) through LNDSM. By reformulating the VAE cross-entropy via Euler–Maruyama discretization and removing variance- exploding control variates, LNDSM enables stable, joint training of encoder, decoder, and latent score networks without time-wise importance sampling. Using a Gaussian mixture latent prior, LNDSM-SGM captures multimodality and symmetry more effectively, achieving lower FID and higher IS than prior methods on MNIST variants, including low-data regimes, while reducing overall training time. This approach highlights the value of combining nonlinear latent dynamics with structured priors to enhance efficiency and fidelity in structured distribution learning.

Abstract

We present latent nonlinear denoising score matching (LNDSM), a novel training objective for score-based generative models that integrates nonlinear forward dynamics with the VAE-based latent SGM framework. This combination is achieved by reformulating the cross-entropy term using the approximate Gaussian transition induced by the Euler-Maruyama scheme. To ensure numerical stability, we identify and remove two zero-mean but variance exploding terms arising from small time steps. Experiments on variants of the MNIST dataset demonstrate that the proposed method achieves faster synthesis and enhanced learning of inherently structured distributions. Compared to benchmark structure-agnostic latent SGMs, LNDSM consistently attains superior sample quality and variability.

Latent Nonlinear Denoising Score Matching for Enhanced Learning of Structured Distributions

TL;DR

The paper tackles learning structured distributions with score-based models by marrying nonlinear latent diffusion (NDSM) with latent space modeling (LSGM) through LNDSM. By reformulating the VAE cross-entropy via Euler–Maruyama discretization and removing variance- exploding control variates, LNDSM enables stable, joint training of encoder, decoder, and latent score networks without time-wise importance sampling. Using a Gaussian mixture latent prior, LNDSM-SGM captures multimodality and symmetry more effectively, achieving lower FID and higher IS than prior methods on MNIST variants, including low-data regimes, while reducing overall training time. This approach highlights the value of combining nonlinear latent dynamics with structured priors to enhance efficiency and fidelity in structured distribution learning.

Abstract

We present latent nonlinear denoising score matching (LNDSM), a novel training objective for score-based generative models that integrates nonlinear forward dynamics with the VAE-based latent SGM framework. This combination is achieved by reformulating the cross-entropy term using the approximate Gaussian transition induced by the Euler-Maruyama scheme. To ensure numerical stability, we identify and remove two zero-mean but variance exploding terms arising from small time steps. Experiments on variants of the MNIST dataset demonstrate that the proposed method achieves faster synthesis and enhanced learning of inherently structured distributions. Compared to benchmark structure-agnostic latent SGMs, LNDSM consistently attains superior sample quality and variability.

Paper Structure

This paper contains 23 sections, 2 theorems, 56 equations, 3 figures, 3 tables.

Key Result

Theorem 3.1

Let $\{z_{n}\}_{n=0}^{n_{f}}$ be the Markov process defined by the EM scheme eq:EM Markov process 1eq:EM Markov process 2. Let $N$ be a random variable uniformly distributed on $\{1, \cdots, n_{f}\}$ and independent of $z_{0}$ and the $U_{n}$'s. Then the cross-entropy term in VAE training loss can b

Figures (3)

  • Figure 1: LSGM and LNDSM-SGM trained using the full MNIST dataset. (a) and (d): Snapshot and fraction of different digits of the training samples. (b) and (e): Snapshot and fraction of different digits of the 10,000 samples generated by the LSGM. The KL divergence from the fraction of the training samples to the fraction of the generated samples: 0.008. (c) and (f): Snapshot and fraction of different digits of the 10,000 samples generated by the LNDSM-SGM. The KL divergence from the fraction of the training samples to the fraction of the generated samples: 0.00114.
  • Figure 2: LSGM and LNDSM-SGM trained using the low data MNIST dataset. (a) and (d): Snapshot and fraction of different digits of the training samples. (b) and (e): Snapshot and fraction of different digits of the 10,000 samples generated by the LSGM. The KL divergence from the fraction of the training samples to the fraction of the generated samples: 0.00883. (c) and (f): Snapshot and fraction of different digits of the 10,000 samples generated by the LNDSM-SGM. The KL divergence from the fraction of the training samples to the fraction of the generated samples: 0.00439.
  • Figure 3: LSGM and LNDSM-SGM trained using the full Approx.-C2-MNIST dataset. (a) and (d): Snapshot and fraction of different digits of the training samples. (b) and (e): Snapshot and fraction of different digits of the 10,000 samples generated by the LSGM. (c) and (f): Snapshot and fraction of different digits of the 10,000 samples generated by the LNDSM-SGM.

Theorems & Definitions (4)

  • Theorem 3.1
  • Lemma 1
  • proof
  • proof