Table of Contents
Fetching ...

Likelihood Matching for Diffusion Models

Lei Qian, Wu Su, Yanqi Huang, Song Xi Chen

TL;DR

Likelihood Matching (LM) reframes diffusion model training as direct likelihood optimization by leveraging the equivalence between the data distribution likelihood and the reverse-time path likelihood, and it uses a quasi-maximum likelihood approach with Gaussian proxies for reverse transitions. Score and Hessian functions are learned via neural nets to perform joint score and covariance matching, forming a LM objective that directly targets the data likelihood. The authors establish consistency and non-asymptotic convergence guarantees for the LM sampler and demonstrate empirical gains over score matching on synthetic mixtures and image datasets, with ablations highlighting the benefit of Hessian information and practical rank settings. Overall, LM provides a principled, likelihood-based alternative to score matching that improves sample fidelity and offers a scalable training paradigm for diffusion models.

Abstract

We propose a Likelihood Matching approach for training diffusion models by first establishing an equivalence between the likelihood of the target data distribution and a likelihood along the sample path of the reverse diffusion. To efficiently compute the reverse sample likelihood, a quasi-likelihood is considered to approximate each reverse transition density by a Gaussian distribution with matched conditional mean and covariance, respectively. The score and Hessian functions for the diffusion generation are estimated by maximizing the quasi-likelihood, ensuring a consistent matching of both the first two transitional moments between every two time points. A stochastic sampler is introduced to facilitate computation that leverages both the estimated score and Hessian information. We establish consistency of the quasi-maximum likelihood estimation, and provide non-asymptotic convergence guarantees for the proposed sampler, quantifying the rates of the approximation errors due to the score and Hessian estimation, dimensionality, and the number of diffusion steps. Empirical and simulation evaluations demonstrate the effectiveness of the proposed Likelihood Matching and validate the theoretical results.

Likelihood Matching for Diffusion Models

TL;DR

Likelihood Matching (LM) reframes diffusion model training as direct likelihood optimization by leveraging the equivalence between the data distribution likelihood and the reverse-time path likelihood, and it uses a quasi-maximum likelihood approach with Gaussian proxies for reverse transitions. Score and Hessian functions are learned via neural nets to perform joint score and covariance matching, forming a LM objective that directly targets the data likelihood. The authors establish consistency and non-asymptotic convergence guarantees for the LM sampler and demonstrate empirical gains over score matching on synthetic mixtures and image datasets, with ablations highlighting the benefit of Hessian information and practical rank settings. Overall, LM provides a principled, likelihood-based alternative to score matching that improves sample fidelity and offers a scalable training paradigm for diffusion models.

Abstract

We propose a Likelihood Matching approach for training diffusion models by first establishing an equivalence between the likelihood of the target data distribution and a likelihood along the sample path of the reverse diffusion. To efficiently compute the reverse sample likelihood, a quasi-likelihood is considered to approximate each reverse transition density by a Gaussian distribution with matched conditional mean and covariance, respectively. The score and Hessian functions for the diffusion generation are estimated by maximizing the quasi-likelihood, ensuring a consistent matching of both the first two transitional moments between every two time points. A stochastic sampler is introduced to facilitate computation that leverages both the estimated score and Hessian information. We establish consistency of the quasi-maximum likelihood estimation, and provide non-asymptotic convergence guarantees for the proposed sampler, quantifying the rates of the approximation errors due to the score and Hessian estimation, dimensionality, and the number of diffusion steps. Empirical and simulation evaluations demonstrate the effectiveness of the proposed Likelihood Matching and validate the theoretical results.

Paper Structure

This paper contains 36 sections, 10 theorems, 93 equations, 5 figures, 4 tables, 2 algorithms.

Key Result

Proposition 1

Suppose that there exits a positive constant $C$ such that $0< \beta_t \leq C$ for any $t\in [0,T]$, and for any open bounded set $\mathcal{O} \subseteq \mathbb{R}^d$, $\int_0^T \int_\mathcal{O}(\norm{q_t(x;\theta)}^2+d\cdot\beta_t\norm{\nabla q_t(x;\theta)}^2 )\dd x\dd t<\infty$, then for any $0 < t_1<\cdots<t_{N-1} < T$.

Figures (5)

  • Figure 1: Illustration of Score Matching (a) versus Likelihood Matching (b) methods. The proposed Likelihood Matching captures a richer set of transition densities while incorporating both score matching and covariance matching, whereas Score Matching exclusively focuses on a single transition density and utilizes only first-order moment information.
  • Figure 2: Maximum Mean Discrepancy (MMD; lower is better) between generated and true samples under two 1D mixture distributions: (a) Gaussian and (b) $t$ with 3 degrees of freedom with respect to the number of sampling steps $N$. (c) Fréchet Inception Distance (FID; lower is better) on the MNIST dataset for different combinations of $(N, r)$ under the Likelihood Matching framework.
  • Figure 3: Comparison of original and synthetic data. (a) Kernel Density Estimations (KDE) for the 1-dimensional case. (b) Clustering results for the 2-dimensional case.
  • Figure 4: Sampling on MNIST. Both Likelihood Matching and Score Matching use the sampler \ref{['eq: sampler']}, with the Hessian function set to zero in the case of Score Matching.
  • Figure 5: Unconditional samples generated by proposed method on 32$\times$32 CIFAR10 (top two rows), 64$\times$64 CelebA (upper middle), 64$\times$64 LSUN Church (lower middle), and 64$\times$64 LSUN Bedroom (bottom row).

Theorems & Definitions (11)

  • Proposition 1
  • Proposition 2
  • Theorem 1: Non-asymptotic Bound for Distributions with Bounded Moments
  • Theorem 2: Consistency under Oracle Model
  • Lemma 1
  • Lemma 2
  • Lemma 3: Lemma 11 in li_accelerating_2024
  • Lemma 4: Lemma 13 in li_accelerating_2024
  • Lemma 5
  • Lemma 6
  • ...and 1 more