Table of Contents
Fetching ...

Probability Density Geodesics in Image Diffusion Latent Space

Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang, Peter Henry Tu, Jing Zhang, Hongdong Li, Richard Hartley, Dylan Campbell

TL;DR

This work introduces probability-density geodesics in diffusion latent space, revealing that geodesics under a density-weighted metric with weight $w(\gamma)=p(\gamma)^{-1}$ preferentially traverse high-density, plausible image regions in diffusion latent spaces. It derives the theoretical framework (weighted path length, Euler–Lagrange equations, functional derivatives) and operationalizes it in the latent space of a pre-trained diffusion model (Stable Diffusion) via conditional densities, score estimation, and efficient IVP/BVP solvers. The authors demonstrate training-free image sequence interpolation and extrapolation, and analyze geodesic properties in video data, reporting competitive quantitative metrics (FID, PPL, PDV, TOPIQ) across datasets without model fine-tuning. This approach provides a principled method to study latent-space geometry and enables practical sequence-editing tasks with pre-trained diffusion models, offering insights for future generative-model analysis and applications.

Abstract

Diffusion models indirectly estimate the probability density over a data space, which can be used to study its structure. In this work, we show that geodesics can be computed in diffusion latent space, where the norm induced by the spatially-varying inner product is inversely proportional to the probability density. In this formulation, a path that traverses a high density (that is, probable) region of image latent space is shorter than the equivalent path through a low density region. We present algorithms for solving the associated initial and boundary value problems and show how to compute the probability density along the path and the geodesic distance between two points. Using these techniques, we analyze how closely video clips approximate geodesics in a pre-trained image diffusion space. Finally, we demonstrate how these techniques can be applied to training-free image sequence interpolation and extrapolation, given a pre-trained image diffusion model.

Probability Density Geodesics in Image Diffusion Latent Space

TL;DR

This work introduces probability-density geodesics in diffusion latent space, revealing that geodesics under a density-weighted metric with weight preferentially traverse high-density, plausible image regions in diffusion latent spaces. It derives the theoretical framework (weighted path length, Euler–Lagrange equations, functional derivatives) and operationalizes it in the latent space of a pre-trained diffusion model (Stable Diffusion) via conditional densities, score estimation, and efficient IVP/BVP solvers. The authors demonstrate training-free image sequence interpolation and extrapolation, and analyze geodesic properties in video data, reporting competitive quantitative metrics (FID, PPL, PDV, TOPIQ) across datasets without model fine-tuning. This approach provides a principled method to study latent-space geometry and enables practical sequence-editing tasks with pre-trained diffusion models, offering insights for future generative-model analysis and applications.

Abstract

Diffusion models indirectly estimate the probability density over a data space, which can be used to study its structure. In this work, we show that geodesics can be computed in diffusion latent space, where the norm induced by the spatially-varying inner product is inversely proportional to the probability density. In this formulation, a path that traverses a high density (that is, probable) region of image latent space is shorter than the equivalent path through a low density region. We present algorithms for solving the associated initial and boundary value problems and show how to compute the probability density along the path and the geodesic distance between two points. Using these techniques, we analyze how closely video clips approximate geodesics in a pre-trained image diffusion space. Finally, we demonstrate how these techniques can be applied to training-free image sequence interpolation and extrapolation, given a pre-trained image diffusion model.

Paper Structure

This paper contains 39 sections, 25 equations, 16 figures, 3 tables, 2 algorithms.

Figures (16)

  • Figure 1: Given a probability density and initial or boundary conditions (here, the position of two points A and B), geodesics can be computed in this space. If the norm is chosen to be inversely proportional to the probability density, these geodesics preferentially traverse high density regions of the space. For an image data space, these correspond to plausible, realistic images according to the probability density, such as that learned by an image diffusion model. Here, we show the outputs of our boundary value problem (BVP) solver that computes a geodesic between endpoints A and B on a toy 2D example. \ref{['fig:splash_geodesic']} The geodesic and straight-line trajectories between A and B, given the underlying visualized probability density field (contours) and its gradient (arrows). \ref{['fig:splash_probability']} Probability density curves for both trajectories, showing that the straight-line path drops to zero probability very rapidly whereas the geodesic remains in higher probability regions. \ref{['fig:splash_images']} Images corresponding to points along a geodesic in Stable Diffusion rombach2022high latent space, given the left and right endpoints, computed using our BVP solver.
  • Figure 2: Pipelines for the image interpolation and extrapolation tasks, addressed by solving boundary and initial value problems in diffusion latent space.
  • Figure 3: Analysis of simple videos generated using CLEVR johnson2017clevr. If a video path was a geodesic in the diffusion latent space, we would expect (i) the norm of the geodesic gradient along the path $\gamma_v$ to be near zero, (ii) the norm to be close to that of the optimized path $\gamma_o$, and (iii) the norm of the perturbed $\gamma_\delta$ or smoothed $\gamma_s$ paths to be larger. From the evidence, we conclude that many of the videos are approximately geodesic.
  • Figure 4: Qualitative comparison of image interpolation results.
  • Figure 5: Example of the evolution of the probability density along the path and the path length during BVP optimization.
  • ...and 11 more figures