Diffusion Models as Cartoonists: The Curious Case of High Density Regions
Rafał Karczewski, Markus Heinonen, Vikas Garg
TL;DR
This work introduces augmented stochastic dynamics to track the log-density of diffusion model samples and develops a mode-tracking ODE together with a high-density sampler. It demonstrates that diffusion models possess dramatically higher-likelihood samples than those produced by standard samplers, often cartoon-like or blurry depending on noise, even on datasets without such content. The authors provide practical likelihood estimation tools, bounds on differences between density formulations, and a high-density sampling algorithm, uncovering a surprising mismatch between likelihood and image realism with implications for both theory and generation strategies. Overall, the study reveals intricate diffusion-model probability landscapes and cautions against focusing solely on high-quality samples, highlighting both methodological advances and limits of current likelihood-tracking approaches.
Abstract
We investigate what kind of images lie in the high-density regions of diffusion models. We introduce a theoretical mode-tracking process capable of pinpointing the exact mode of the denoising distribution, and we propose a practical high-density sampler that consistently generates images of higher likelihood than usual samplers. Our empirical findings reveal the existence of significantly higher likelihood samples that typical samplers do not produce, often manifesting as cartoon-like drawings or blurry images depending on the noise level. Curiously, these patterns emerge in datasets devoid of such examples. We also present a novel approach to track sample likelihoods in diffusion SDEs, which remarkably incurs no additional computational cost. Code is available at https://github.com/Aalto-QuML/high-density-diffusion.
