Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang, Peter Henry Tu, Jing Zhang, Hongdong Li, Richard Hartley, Dylan Campbell
TL;DR
This work introduces probability-density geodesics in diffusion latent space, revealing that geodesics under a density-weighted metric with weight $w(\gamma)=p(\gamma)^{-1}$ preferentially traverse high-density, plausible image regions in diffusion latent spaces. It derives the theoretical framework (weighted path length, Euler–Lagrange equations, functional derivatives) and operationalizes it in the latent space of a pre-trained diffusion model (Stable Diffusion) via conditional densities, score estimation, and efficient IVP/BVP solvers. The authors demonstrate training-free image sequence interpolation and extrapolation, and analyze geodesic properties in video data, reporting competitive quantitative metrics (FID, PPL, PDV, TOPIQ) across datasets without model fine-tuning. This approach provides a principled method to study latent-space geometry and enables practical sequence-editing tasks with pre-trained diffusion models, offering insights for future generative-model analysis and applications.
Abstract
Diffusion models indirectly estimate the probability density over a data space, which can be used to study its structure. In this work, we show that geodesics can be computed in diffusion latent space, where the norm induced by the spatially-varying inner product is inversely proportional to the probability density. In this formulation, a path that traverses a high density (that is, probable) region of image latent space is shorter than the equivalent path through a low density region. We present algorithms for solving the associated initial and boundary value problems and show how to compute the probability density along the path and the geodesic distance between two points. Using these techniques, we analyze how closely video clips approximate geodesics in a pre-trained image diffusion space. Finally, we demonstrate how these techniques can be applied to training-free image sequence interpolation and extrapolation, given a pre-trained image diffusion model.
