Table of Contents
Fetching ...

Probing the Geometry of Diffusion Models with the String Method

Elio Moreau, Florentin Coeurdoux, Grégoire Ferre, Eric Vanden-Eijnden

TL;DR

A framework based on the string method that computes continuous paths between samples by evolving curves under the learned score function is introduced, establishing the string method as a principled tool for probing the modal structure of diffusion models.

Abstract

Understanding the geometry of learned distributions is fundamental to improving and interpreting diffusion models, yet systematic tools for exploring their landscape remain limited. Standard latent-space interpolations fail to respect the structure of the learned distribution, often traversing low-density regions. We introduce a framework based on the string method that computes continuous paths between samples by evolving curves under the learned score function. Operating on pretrained models without retraining, our approach interpolates between three regimes: pure generative transport, which yields continuous sample paths; gradient-dominated dynamics, which recover minimum energy paths (MEPs); and finite-temperature string dynamics, which compute principal curves -- self-consistent paths that balance energy and entropy. We demonstrate that the choice of regime matters in practice. For image diffusion models, MEPs contain high-likelihood but unrealistic ''cartoon'' images, confirming prior observations that likelihood maxima appear unrealistic; principal curves instead yield realistic morphing sequences despite lower likelihood. For protein structure prediction, our method computes transition pathways between metastable conformers directly from models trained on static structures, yielding paths with physically plausible intermediates. Together, these results establish the string method as a principled tool for probing the modal structure of diffusion models -- identifying modes, characterizing barriers, and mapping connectivity in complex learned distributions.

Probing the Geometry of Diffusion Models with the String Method

TL;DR

A framework based on the string method that computes continuous paths between samples by evolving curves under the learned score function is introduced, establishing the string method as a principled tool for probing the modal structure of diffusion models.

Abstract

Understanding the geometry of learned distributions is fundamental to improving and interpreting diffusion models, yet systematic tools for exploring their landscape remain limited. Standard latent-space interpolations fail to respect the structure of the learned distribution, often traversing low-density regions. We introduce a framework based on the string method that computes continuous paths between samples by evolving curves under the learned score function. Operating on pretrained models without retraining, our approach interpolates between three regimes: pure generative transport, which yields continuous sample paths; gradient-dominated dynamics, which recover minimum energy paths (MEPs); and finite-temperature string dynamics, which compute principal curves -- self-consistent paths that balance energy and entropy. We demonstrate that the choice of regime matters in practice. For image diffusion models, MEPs contain high-likelihood but unrealistic ''cartoon'' images, confirming prior observations that likelihood maxima appear unrealistic; principal curves instead yield realistic morphing sequences despite lower likelihood. For protein structure prediction, our method computes transition pathways between metastable conformers directly from models trained on static structures, yielding paths with physically plausible intermediates. Together, these results establish the string method as a principled tool for probing the modal structure of diffusion models -- identifying modes, characterizing barriers, and mapping connectivity in complex learned distributions.
Paper Structure (52 sections, 1 theorem, 23 equations, 31 figures, 6 tables, 1 algorithm)

This paper contains 52 sections, 1 theorem, 23 equations, 31 figures, 6 tables, 1 algorithm.

Key Result

Proposition B.1

Let $\phi_t: [0,1] \to \mathbb{R}^d$ be a curve evolving under a velocity field $v_t$. Assume that the endpoints evolve as independent points: Then, for $s \in (0,1)$, the evolution preserves the arc-length parametrization $|\partial_s \phi_t(s)| = L(t)$ (constant in $s$) for an appropriate choice of $\lambda_t(s)$ with $\lambda_t(0) = \lambda_t(1) = 0$.

Figures (31)

  • Figure 1: The likelihood-realism paradox. Top: schematic showing the MEP (green) passing through the high-likelihood region (yellow), while the principal curve (red) stays within the typical set where data concentrates (blue points). Dashed lines indicate Voronoi cells---regions of points closest to each image along the string. Bottom: actual images at numbered locations. Endpoints (1, 2) are identical for both paths; the principal curve intermediate (3) is realistic; the MEP intermediate (4) is cartoonish. Full pathways computed by our method are shown in Figures \ref{['fig:beaver_lambda']} and \ref{['fig:beaver_temperature']}.
  • Figure 2: The string method. Grey dashed arrows show Step 1 (evolution): each image moves according to $v_t$, landing at positions marked with $\times$. Blue dotted arrows show Step 2 (reparametrization): images are redistributed to restore equal arc-length spacing along the string.
  • Figure 3: Relative score estimation error:$\mathbb{E}_{\text{model}}[|s_t - \hat{s}_t|/|s_t|]$ as a function of $t$ for a mixture of Gaussians in various dimensions. The error increases sharply near $t=1$, motivating the quenching of $\gamma_t$ as $t \to 1$. For details see Appendix \ref{['app:testing']}
  • Figure 4: Effect of score weight $\gamma$. Left: string realizations for $\gamma$ ranging from 15 (top) to 2 (middle) to $10^{-2}$ (bottom) in logarithmic steps (factor of $\sqrt{10}$). Higher $\gamma$ drives paths through abstract, high-likelihood modes; lower $\gamma$ preserves realism. Right: log-likelihood of images along each string (colored curves), overlaid on the likelihood distribution of ImageNet validation images (heatmap; see Figure \ref{['fig:likelihood_hist']} for details). The intermediate images along the MEP reach likelihoods far exceeding typical images.
  • Figure 5: Effect of temperature $T$. Left: string realizations for $T$ ranging from 0.1 (top) to 0.5 (middle) to 0.9 (bottom). Lower $T$ drives paths through cartoon-like, high-likelihood regions; higher $T$ produces realistic samples. Right: log-likelihood of images along each string (colored curves), overlaid on the likelihood distribution of ImageNet validation images (heatmap; see Figure \ref{['fig:likelihood_hist']}). As $T$ increases, the likelihood of the intermediates images decreases toward typical values, and images become more realistic.
  • ...and 26 more figures

Theorems & Definitions (5)

  • Definition 2.1: Minimum Energy Path
  • Definition 2.2: Principal Curve
  • Remark 2.3
  • Proposition B.1
  • proof