Maximum Likelihood Training of Score-Based Diffusion Models

Yang Song; Conor Durkan; Iain Murray; Stefano Ermon

Maximum Likelihood Training of Score-Based Diffusion Models

Yang Song, Conor Durkan, Iain Murray, Stefano Ermon

TL;DR

The work addresses improving the likelihoods of score-based diffusion methods by deriving a likelihood-weighted objective that upper-bounds the negative log-likelihood, enabling approximate maximum likelihood training with efficiency comparable to score matching. It connects diffusion processes to continuous normalizing flows, provides KL-based and per-datapoint bounds, and introduces variance-reduction and variational dequantization to boost likelihoods. Empirically, likelihood weighting (with importance sampling and variational dequantization) yields consistent likelihood improvements across SDE families on CIFAR-10 and ImageNet-32×32, achieving 2.83 and 3.76 bits/dim and competitive sample-quality trade-offs. The results position score-based diffusion methods as competitive with normalizing flows for tractable likelihood, while highlighting limitations like slower sampling and uncertain transfer to discrete data.

Abstract

Score-based diffusion models synthesize samples by reversing a stochastic process that diffuses data to noise, and are trained by minimizing a weighted combination of score matching losses. The log-likelihood of score-based diffusion models can be tractably computed through a connection to continuous normalizing flows, but log-likelihood is not directly optimized by the weighted combination of score matching losses. We show that for a specific weighting scheme, the objective upper bounds the negative log-likelihood, thus enabling approximate maximum likelihood training of score-based diffusion models. We empirically observe that maximum likelihood training consistently improves the likelihood of score-based diffusion models across multiple datasets, stochastic processes, and model architectures. Our best models achieve negative log-likelihoods of 2.83 and 3.76 bits/dim on CIFAR-10 and ImageNet 32x32 without any data augmentation, on a par with state-of-the-art autoregressive models on these tasks.

Maximum Likelihood Training of Score-Based Diffusion Models

TL;DR

Abstract

Maximum Likelihood Training of Score-Based Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (21)