Table of Contents
Fetching ...

Diffusion Models as Cartoonists: The Curious Case of High Density Regions

Rafał Karczewski, Markus Heinonen, Vikas Garg

TL;DR

This work introduces augmented stochastic dynamics to track the log-density of diffusion model samples and develops a mode-tracking ODE together with a high-density sampler. It demonstrates that diffusion models possess dramatically higher-likelihood samples than those produced by standard samplers, often cartoon-like or blurry depending on noise, even on datasets without such content. The authors provide practical likelihood estimation tools, bounds on differences between density formulations, and a high-density sampling algorithm, uncovering a surprising mismatch between likelihood and image realism with implications for both theory and generation strategies. Overall, the study reveals intricate diffusion-model probability landscapes and cautions against focusing solely on high-quality samples, highlighting both methodological advances and limits of current likelihood-tracking approaches.

Abstract

We investigate what kind of images lie in the high-density regions of diffusion models. We introduce a theoretical mode-tracking process capable of pinpointing the exact mode of the denoising distribution, and we propose a practical high-density sampler that consistently generates images of higher likelihood than usual samplers. Our empirical findings reveal the existence of significantly higher likelihood samples that typical samplers do not produce, often manifesting as cartoon-like drawings or blurry images depending on the noise level. Curiously, these patterns emerge in datasets devoid of such examples. We also present a novel approach to track sample likelihoods in diffusion SDEs, which remarkably incurs no additional computational cost. Code is available at https://github.com/Aalto-QuML/high-density-diffusion.

Diffusion Models as Cartoonists: The Curious Case of High Density Regions

TL;DR

This work introduces augmented stochastic dynamics to track the log-density of diffusion model samples and develops a mode-tracking ODE together with a high-density sampler. It demonstrates that diffusion models possess dramatically higher-likelihood samples than those produced by standard samplers, often cartoon-like or blurry depending on noise, even on datasets without such content. The authors provide practical likelihood estimation tools, bounds on differences between density formulations, and a high-density sampling algorithm, uncovering a surprising mismatch between likelihood and image realism with implications for both theory and generation strategies. Overall, the study reveals intricate diffusion-model probability landscapes and cautions against focusing solely on high-quality samples, highlighting both methodological advances and limits of current likelihood-tracking approaches.

Abstract

We investigate what kind of images lie in the high-density regions of diffusion models. We introduce a theoretical mode-tracking process capable of pinpointing the exact mode of the denoising distribution, and we propose a practical high-density sampler that consistently generates images of higher likelihood than usual samplers. Our empirical findings reveal the existence of significantly higher likelihood samples that typical samplers do not produce, often manifesting as cartoon-like drawings or blurry images depending on the noise level. Curiously, these patterns emerge in datasets devoid of such examples. We also present a novel approach to track sample likelihoods in diffusion SDEs, which remarkably incurs no additional computational cost. Code is available at https://github.com/Aalto-QuML/high-density-diffusion.

Paper Structure

This paper contains 51 sections, 17 theorems, 143 equations, 14 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Let ${\bm{x}}$ be a random process defined by equation eq:rev-sde. Then

Figures (14)

  • Figure 1: High-density samples ${\bm{y}}_0|{\bm{x}}_t$ resemble cartoon drawings. This is in contrast to regular denoising samples ${\bm{x}}_0 \sim p({\bm{x}}_0|{\bm{x}}_t)$ or expectations $\mathbb{E}[{\bm{x}}_0|{\bm{x}}_t]$. The data contains no cartoons.
  • Figure 2: Our contributions.
  • Figure 3: Tracking stochastic sampling likelihood. Estimation of $\log p_t({\bm{x}}_t)$ (colored trajectory0,255,0255,0,0) for stochastic sampling via Augmented Reverse SDE (equation \ref{['eq:rev-sde-logp-dynamics']}) on a Gaussian mixture with known $\nabla_{\bm{x}} \log p_t({\bm{x}})$ and $p_T$. Evaluation of $d\log p_t({\bm{x}}_t)$ requires only the score function.
  • Figure 4: $r_0 > \log p_0^{\mathrm{ODE}}({\bm{x}}_0) > \mathrm{ELBO}({\bm{x}}_0)$ correlate strongly.
  • Figure 5: Equation \ref{['eq:mode-ode']} accurately recovers the mode-tracking curve.
  • ...and 9 more figures

Theorems & Definitions (32)

  • Theorem 1: Augmented reverse SDE
  • Theorem 2: Augmented forward SDE
  • Theorem 3: Approximate Augmented Reverse SDE
  • Theorem 4: Approximate Augmented Forward SDE
  • Theorem 5: Mode-tracking ODE
  • Remark 1: High-density ODE or HD-ODE
  • Lemma 1: General instantaneous change of variables
  • Theorem 5: Augmented forward SDE
  • proof
  • Theorem 5: Augmented reverse SDE
  • ...and 22 more