Table of Contents
Fetching ...

EnfoPath: Energy-Informed Analysis of Generative Trajectories in Flow Matching

Ziyun Li, Ben Dai, Huancheng Hu, Henrik Boström, Soon Hoe Lim

TL;DR

This work proposes kinetic path energy $E=\tfrac{1}{2}\int_0^1 \|v_\theta(x(t),t)\|^2\,dt$ as a trajectory-level diagnostic for flow-based generative samplers, treating each generation as a particle moving through a velocity field. It grounds this metric in a physics-inspired framework and connects it to the Benamou–Brenier formulation of the optimal transport cost when the flow is optimal. Empirically, higher $E$ correlates with stronger semantic content (CLIP score/margin) and with lower data density, indicating that informative samples tend to occupy sparse regions of the data manifold and require greater kinetic cost to reach. This trajectory-centric view provides interpretable insights beyond endpoint metrics and motivates theoretical extensions to stochastic samplers and OT-based analyses.

Abstract

Flow-based generative models synthesize data by integrating a learned velocity field from a reference distribution to the target data distribution. Prior work has focused on endpoint metrics (e.g., fidelity, likelihood, perceptual quality) while overlooking a deeper question: what do the sampling trajectories reveal? Motivated by classical mechanics, we introduce kinetic path energy (KPE), a simple yet powerful diagnostic that quantifies the total kinetic effort along each generation path of ODE-based samplers. Through comprehensive experiments on CIFAR-10 and ImageNet-256, we uncover two key phenomena: ({i}) higher KPE predicts stronger semantic quality, indicating that semantically richer samples require greater kinetic effort, and ({ii}) higher KPE inversely correlates with data density, with informative samples residing in sparse, low-density regions. Together, these findings reveal that semantically informative samples naturally reside on the sparse frontier of the data distribution, demanding greater generative effort. Our results suggest that trajectory-level analysis offers a physics-inspired and interpretable framework for understanding generation difficulty and sample characteristics.

EnfoPath: Energy-Informed Analysis of Generative Trajectories in Flow Matching

TL;DR

This work proposes kinetic path energy as a trajectory-level diagnostic for flow-based generative samplers, treating each generation as a particle moving through a velocity field. It grounds this metric in a physics-inspired framework and connects it to the Benamou–Brenier formulation of the optimal transport cost when the flow is optimal. Empirically, higher correlates with stronger semantic content (CLIP score/margin) and with lower data density, indicating that informative samples tend to occupy sparse regions of the data manifold and require greater kinetic cost to reach. This trajectory-centric view provides interpretable insights beyond endpoint metrics and motivates theoretical extensions to stochastic samplers and OT-based analyses.

Abstract

Flow-based generative models synthesize data by integrating a learned velocity field from a reference distribution to the target data distribution. Prior work has focused on endpoint metrics (e.g., fidelity, likelihood, perceptual quality) while overlooking a deeper question: what do the sampling trajectories reveal? Motivated by classical mechanics, we introduce kinetic path energy (KPE), a simple yet powerful diagnostic that quantifies the total kinetic effort along each generation path of ODE-based samplers. Through comprehensive experiments on CIFAR-10 and ImageNet-256, we uncover two key phenomena: ({i}) higher KPE predicts stronger semantic quality, indicating that semantically richer samples require greater kinetic effort, and ({ii}) higher KPE inversely correlates with data density, with informative samples residing in sparse, low-density regions. Together, these findings reveal that semantically informative samples naturally reside on the sparse frontier of the data distribution, demanding greater generative effort. Our results suggest that trajectory-level analysis offers a physics-inspired and interpretable framework for understanding generation difficulty and sample characteristics.

Paper Structure

This paper contains 13 sections, 5 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Trajectory energy ($E$) positively correlates with CLIP score across CFG settings.
  • Figure 2: Trajectory energy ($E$) positively correlates with CLIP margin across CFG settings.
  • Figure 3: (a) 3D surfaces of $\log(\text{density})$ (left) and trajectory energy $E$ (right) (CIFAR-10, 150 steps) show high-density regions align with low energy. (b) Top 10% highest-energy samples consistently fall in low-density regions, confirming the inverse energy-density correlation.
  • Figure 3: Correlation metrics of $k$-NN and KDE methods on CIFAR-10 and ImageNet-256 for different $N$.
  • Figure 4: $E$ vs. k-NN log-density ($\rho=-0.6314$, $p=6.56\times10^{-223}$).
  • ...and 11 more figures