Latent Nonlinear Denoising Score Matching for Enhanced Learning of Structured Distributions
Kaichen Shen, Wei Zhu
TL;DR
The paper tackles learning structured distributions with score-based models by marrying nonlinear latent diffusion (NDSM) with latent space modeling (LSGM) through LNDSM. By reformulating the VAE cross-entropy via Euler–Maruyama discretization and removing variance- exploding control variates, LNDSM enables stable, joint training of encoder, decoder, and latent score networks without time-wise importance sampling. Using a Gaussian mixture latent prior, LNDSM-SGM captures multimodality and symmetry more effectively, achieving lower FID and higher IS than prior methods on MNIST variants, including low-data regimes, while reducing overall training time. This approach highlights the value of combining nonlinear latent dynamics with structured priors to enhance efficiency and fidelity in structured distribution learning.
Abstract
We present latent nonlinear denoising score matching (LNDSM), a novel training objective for score-based generative models that integrates nonlinear forward dynamics with the VAE-based latent SGM framework. This combination is achieved by reformulating the cross-entropy term using the approximate Gaussian transition induced by the Euler-Maruyama scheme. To ensure numerical stability, we identify and remove two zero-mean but variance exploding terms arising from small time steps. Experiments on variants of the MNIST dataset demonstrate that the proposed method achieves faster synthesis and enhanced learning of inherently structured distributions. Compared to benchmark structure-agnostic latent SGMs, LNDSM consistently attains superior sample quality and variability.
