Improving Diffusion-Based Generative Models via Approximated Optimal Transport
Daegyu Kim, Jooyoung Choi, Chaehun Shin, Uiwon Hwang, Sungroh Yoon
TL;DR
This work tackles the high curvature and truncation errors limiting diffusion-based image synthesis by introducing Approximated Optimal Transport (AOT), a training scheme that approximates optimal transport via Hungarian-assignment to pair images with informative noise. By reducing the information entropy of the training targets, AOT yields straighter, lower-curvature ODE trajectories and enables high-quality generation with far fewer function evaluations, demonstrated on CIFAR-10 with CIFAR-10 results of $\text{FID}=1.88$ at 27 NFEs uncond. and $1.73$ at 29 NFEs cond., with further gains to $1.68$ and $1.58$ under Discriminator Guidance. The method also integrates with DG by training the discriminator on AOT-synthesized pairs, achieving state-of-the-art FID scores at 29 NFEs. Overall, AOT offers a training-centered path to reduce sampling costs while maintaining or improving image quality, with configurable GPU-memory strategies and potential for extension to conditional guidance beyond images.
Abstract
We introduce the Approximated Optimal Transport (AOT) technique, a novel training scheme for diffusion-based generative models. Our approach aims to approximate and integrate optimal transport into the training process, significantly enhancing the ability of diffusion models to estimate the denoiser outputs accurately. This improvement leads to ODE trajectories of diffusion models with lower curvature and reduced truncation errors during sampling. We achieve superior image quality and reduced sampling steps by employing AOT in training. Specifically, we achieve FID scores of 1.88 with just 27 NFEs and 1.73 with 29 NFEs in unconditional and conditional generations, respectively. Furthermore, when applying AOT to train the discriminator for guidance, we establish new state-of-the-art FID scores of 1.68 and 1.58 for unconditional and conditional generations, respectively, each with 29 NFEs. This outcome demonstrates the effectiveness of AOT in enhancing the performance of diffusion models.
