Table of Contents
Fetching ...

Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling

Erkan Turan, Aristotelis Siozopoulos, Louis Martinez, Julien Gaubil, Emery Pierson, Maks Ovsjanikov

TL;DR

This work addresses the slow sampling and limited interpretability of Continuous Normalizing Flows by learning a global Koopman representation of the pre-trained Conditional Flow Matching dynamics. By augmenting the dynamics to a time-augmented autonomous system and training an encoder, generator, and decoder, the method yields a linear latent space whose evolution is governed by a finite-dimensional generator $L$, enabling one-step sampling via the matrix exponential $e^{L}$ and enabling spectral interpretability through Koopman modes. A simulation-free training objective enforces infinitesimal consistency with the teacher's vector field along full trajectories, yielding faithful dynamic behavior and robust interpretability beyond boundary-only methods. Empirically, the approach achieves competitive sample quality with dramatic speedups, and the learned Koopman structure supports mode-based editing and transfer to pixel-space dynamics, highlighting practical benefits for controllable and interpretable generative modeling.

Abstract

Continuous Normalizing Flows (CNFs) enable elegant generative modeling but remain bottlenecked by slow sampling: producing a single sample requires solving a nonlinear ODE with hundreds of function evaluations. Recent approaches such as Rectified Flow and OT-CFM accelerate sampling by straightening trajectories, yet the learned dynamics remain nonlinear black boxes, limiting both efficiency and interpretability. We propose a fundamentally different perspective: globally linearizing flow dynamics via Koopman theory. By lifting Conditional Flow Matching (CFM) into a higher-dimensional Koopman space, we represent its evolution with a single linear operator. This yields two key benefits. First, sampling becomes one-step and parallelizable, computed in closed form via the matrix exponential. Second, the Koopman operator provides a spectral blueprint of generation, enabling novel interpretability through its eigenvalues and modes. We derive a practical, simulation-free training objective that enforces infinitesimal consistency with the teacher's dynamics and show that this alignment preserves fidelity along the full generative path, distinguishing our method from boundary-only distillation. Empirically, our approach achieves competitive sample quality with dramatic speedups, while uniquely enabling spectral analysis of generative flows.

Unfolding Generative Flows with Koopman Operators: Fast and Interpretable Sampling

TL;DR

This work addresses the slow sampling and limited interpretability of Continuous Normalizing Flows by learning a global Koopman representation of the pre-trained Conditional Flow Matching dynamics. By augmenting the dynamics to a time-augmented autonomous system and training an encoder, generator, and decoder, the method yields a linear latent space whose evolution is governed by a finite-dimensional generator , enabling one-step sampling via the matrix exponential and enabling spectral interpretability through Koopman modes. A simulation-free training objective enforces infinitesimal consistency with the teacher's vector field along full trajectories, yielding faithful dynamic behavior and robust interpretability beyond boundary-only methods. Empirically, the approach achieves competitive sample quality with dramatic speedups, and the learned Koopman structure supports mode-based editing and transfer to pixel-space dynamics, highlighting practical benefits for controllable and interpretable generative modeling.

Abstract

Continuous Normalizing Flows (CNFs) enable elegant generative modeling but remain bottlenecked by slow sampling: producing a single sample requires solving a nonlinear ODE with hundreds of function evaluations. Recent approaches such as Rectified Flow and OT-CFM accelerate sampling by straightening trajectories, yet the learned dynamics remain nonlinear black boxes, limiting both efficiency and interpretability. We propose a fundamentally different perspective: globally linearizing flow dynamics via Koopman theory. By lifting Conditional Flow Matching (CFM) into a higher-dimensional Koopman space, we represent its evolution with a single linear operator. This yields two key benefits. First, sampling becomes one-step and parallelizable, computed in closed form via the matrix exponential. Second, the Koopman operator provides a spectral blueprint of generation, enabling novel interpretability through its eigenvalues and modes. We derive a practical, simulation-free training objective that enforces infinitesimal consistency with the teacher's dynamics and show that this alignment preserves fidelity along the full generative path, distinguishing our method from boundary-only distillation. Empirically, our approach achieves competitive sample quality with dramatic speedups, while uniquely enabling spectral analysis of generative flows.

Paper Structure

This paper contains 44 sections, 3 theorems, 42 equations, 9 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

The Koopman observable coordinates $g$ are identifiable only up to an arbitrary invertible linear transformation $M$. If the pair $(g, L)$ satisfies the consistency and phase objectives, so does the transformed pair $(M^{-1}g,~ M^{-1}LM)$.

Figures (9)

  • Figure 1: Overview of our approach: we propose to apply Koopman theory to the dynamics of generative modeling from continuous normalizing flow models. We learn a Koopman latent space and its linear dynamics from a given non-linear CNF model. This approach presents two direct applications: one-step sampling and flow model interpretability.
  • Figure 2: FID score as a function of Koopman dimension on the FFHQ dataset. The higher the dimension, the lower the FID.
  • Figure 3: Effect of varying the amplitude of the perturbation of the latent of a real image in a given direction. Results for the model trained with consistency loss (top rows), versus without (bottom row).
  • Figure 4: Recovering Koopman Modes in Pixel-Space Dynamics. (Left Column) The optimized, structured noise perturbation ($x_{\text{pert}}^i$) for 4 Koopman modes $v_i$. (Center Columns) Images generated by the CFM model from initial noise $x_0' = x_0 + \alpha x_{\text{pert}}^i v_i$ with increasing $\alpha$. (Right Column) The image generated by directly decoding the target Koopman mode $v_i$.
  • Figure 5: t-SNE visualization of CFM and Koopman trajectories in the embedding space on FFHQ. The consistency loss makes Koopman rollouts (dotted) follow the teacher dynamics (continuous) more closely. This is seen both in the proximity of trajectories and in the alignment of their endpoints. Circles mark starting points and squares mark end points.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Proposition 1: Non-identifiability up to linear transformation
  • Proposition 2: Marginal vs. Conditional Objectives
  • Proposition 3: Practical Estimator for the Consistency Loss
  • proof
  • proof
  • proof