Improved Techniques for Maximum Likelihood Estimation for Diffusion ODEs
Kaiwen Zheng, Cheng Lu, Jianfei Chen, Jun Zhu
TL;DR
This paper targets likelihood estimation for diffusion ODEs, a family of continuous normalizing flows that admit exact likelihood yet historically lag behind variational methods. It introduces i-DODE, a suite of techniques spanning training (velocity parameterization, high-order flow matching, log-SNR timing, and variance reduction) and evaluation (training-free truncated-normal dequantization with importance sampling) to close the gap. The key contributions include a training-free dequantization that aligns training and testing distributions, a velocity-based flow-matching framework with a second-order regularizer, and an IS strategy that accelerates convergence. Empirically, i-DODE achieves state-of-the-art likelihood on CIFAR-10 and ImageNet-32 without variational dequantization or augmentation (e.g., 2.56 BPD on CIFAR-10 and 3.43/3.69 BPD on ImageNet-32), with further gains when data augmentation is applied, and reports faster convergence and smoother trajectories. Overall, the work provides practical, scalable improvements for density estimation with diffusion ODEs and advances their competitiveness among likelihood-based generative models.
Abstract
Diffusion models have exhibited excellent performance in various domains. The probability flow ordinary differential equation (ODE) of diffusion models (i.e., diffusion ODEs) is a particular case of continuous normalizing flows (CNFs), which enables deterministic inference and exact likelihood evaluation. However, the likelihood estimation results by diffusion ODEs are still far from those of the state-of-the-art likelihood-based generative models. In this work, we propose several improved techniques for maximum likelihood estimation for diffusion ODEs, including both training and evaluation perspectives. For training, we propose velocity parameterization and explore variance reduction techniques for faster convergence. We also derive an error-bounded high-order flow matching objective for finetuning, which improves the ODE likelihood and smooths its trajectory. For evaluation, we propose a novel training-free truncated-normal dequantization to fill the training-evaluation gap commonly existing in diffusion ODEs. Building upon these techniques, we achieve state-of-the-art likelihood estimation results on image datasets (2.56 on CIFAR-10, 3.43/3.69 on ImageNet-32) without variational dequantization or data augmentation, and 2.42 on CIFAR-10 with data augmentation. Code is available at \url{https://github.com/thu-ml/i-DODE}.
