Table of Contents
Fetching ...

Devil is in the Details: Density Guidance for Detail-Aware Generation with Flow Models

Rafał Karczewski, Markus Heinonen, Vikas Garg

TL;DR

This work addresses the disconnect between image likelihood and perceptual detail in diffusion/flow models by showing that higher density samples tend to be smoother and less detailed. It introduces Score Alignment to explain why latent-code scaling (Prior Guidance) affects detail and proves tractable checks for CNFs; it then derives Density Guidance, a principled ODE modification that enforces explicit log-density trajectories during sampling. Extending Density Guidance to stochastic sampling yields Stochastic Density Guidance, which preserves exact log-density control while enabling controlled variation in high-level structure or fine details. Empirically, Density Guidance achieves fine-grained control of image detail with sample quality on par with prior methods, and the approach generalizes to conditional generation and stochastic settings, offering practical tools for density-aware generation.

Abstract

Diffusion models have emerged as a powerful class of generative models, capable of producing high-quality images by mapping noise to a data distribution. However, recent findings suggest that image likelihood does not align with perceptual quality: high-likelihood samples tend to be smooth, while lower-likelihood ones are more detailed. Controlling sample density is thus crucial for balancing realism and detail. In this paper, we analyze an existing technique, Prior Guidance, which scales the latent code to influence image detail. We introduce score alignment, a condition that explains why this method works and show that it can be tractably checked for any continuous normalizing flow model. We then propose Density Guidance, a principled modification of the generative ODE that enables exact log-density control during sampling. Finally, we extend Density Guidance to stochastic sampling, ensuring precise log-density control while allowing controlled variation in structure or fine details. Our experiments demonstrate that these techniques provide fine-grained control over image detail without compromising sample quality. Code is available at https://github.com/Aalto-QuML/density-guidance.

Devil is in the Details: Density Guidance for Detail-Aware Generation with Flow Models

TL;DR

This work addresses the disconnect between image likelihood and perceptual detail in diffusion/flow models by showing that higher density samples tend to be smoother and less detailed. It introduces Score Alignment to explain why latent-code scaling (Prior Guidance) affects detail and proves tractable checks for CNFs; it then derives Density Guidance, a principled ODE modification that enforces explicit log-density trajectories during sampling. Extending Density Guidance to stochastic sampling yields Stochastic Density Guidance, which preserves exact log-density control while enabling controlled variation in high-level structure or fine details. Empirically, Density Guidance achieves fine-grained control of image detail with sample quality on par with prior methods, and the approach generalizes to conditional generation and stochastic settings, offering practical tools for density-aware generation.

Abstract

Diffusion models have emerged as a powerful class of generative models, capable of producing high-quality images by mapping noise to a data distribution. However, recent findings suggest that image likelihood does not align with perceptual quality: high-likelihood samples tend to be smooth, while lower-likelihood ones are more detailed. Controlling sample density is thus crucial for balancing realism and detail. In this paper, we analyze an existing technique, Prior Guidance, which scales the latent code to influence image detail. We introduce score alignment, a condition that explains why this method works and show that it can be tractably checked for any continuous normalizing flow model. We then propose Density Guidance, a principled modification of the generative ODE that enables exact log-density control during sampling. Finally, we extend Density Guidance to stochastic sampling, ensuring precise log-density control while allowing controlled variation in structure or fine details. Our experiments demonstrate that these techniques provide fine-grained control over image detail without compromising sample quality. Code is available at https://github.com/Aalto-QuML/density-guidance.

Paper Structure

This paper contains 44 sections, 8 theorems, 133 equations, 22 figures, 1 algorithm.

Key Result

Lemma 1.1

Let $f: \mathbb{R}^D \to \mathbb{R}^D$ with continuous 2-nd order derivatives and ${\bm{v}} \in \mathbb{R}$. Define $g({\bm{x}}) \coloneqq \operatorname{div} f({\bm{x}}) = \sum_{i=1}^D \frac{\partial f^i}{\partial x_i}({\bm{x}})$ and $G({\bm{x}}) \coloneqq \frac{\partial f}{\partial {\bm{x}}}({\bm{x Equivalently, we write it as

Figures (22)

  • Figure 1: Density guidance controls the amount of detail. Images sampled from the same condition and latent code with different strengths of guidance. Top: StableDiffusion v2.1 rombach2021highresolution. Bottom: EDM2 karras2024analyzing.
  • Figure 2: Prior Guidance and Score Alignment (SA). Prior guidance works by moving ${\bm{x}}_T$ (noise) in the direction of $\textcolor{#ED1C24}{\nabla \log p_T({\bm{x}}_T)}$ and decoding to ${\bm{x}}_0$ (image). The purpose of this is to increase or decrease $\log p_0({\bm{x}}_0)$, which is inversely related to the level of detail in ${\bm{x}}_0$. SA is a condition that ensures prior guidance is effective by requiring the alignment of score vectors across time steps. Red arrows represent tangents to the curves: $\nabla \log p_T({\bm{x}}_T)$ is the tangent to the left curve at ${\bm{x}}_T$, and its push-forward via $\frac{\partial {\bm{x}}_0}{\partial {\bm{x}}_T}$ is the tangent to the decoded curve at ${\bm{x}}_0$. SA states that the transformed tangent vector must align with $\textcolor{#27AAE1}{\nabla \log p_0({\bm{x}}_0)}$ such that the angle $\alpha \leq 90^\circ$ (non-negative dot product).
  • Figure 3: Nearly all ${\bm{x}}_T$ satisfy the positive score alignment of \ref{['eq:vfa']} across models and datasets.
  • Figure 4: Negative $\log p_t({\bm{x}}_t)$ correlates well with image compression size, while the recently proposed intrinsic dimensionality measure FLIPD kamkari2024geometric correlates weakly. Experiment performed for a latent diffusion model EDM2 trained on ImageNet $512 \times 512$ resolution karras2024analyzing.
  • Figure 5: The augmented sensitivity equations of \ref{['eq:aug-sensitivity']} accurately tracks the score alignment (SA).
  • ...and 17 more figures

Theorems & Definitions (14)

  • Lemma 1.1
  • proof
  • Lemma 1.2
  • proof
  • Lemma 6.1: Quadratic CLT deJong1987
  • Lemma 7.1: Itô's Lemma
  • Corollary 1: Reverse-time Itô's lemma
  • proof
  • Lemma 14.1
  • proof
  • ...and 4 more