Table of Contents
Fetching ...

Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans

Christian Wald, Gabriele Steidl

TL;DR

This work surveys flow matching from a rigorous mathematical standpoint, unifying three perspectives—couplings, Markov kernels, and stochastic processes—for constructing velocity fields that transport from a latent to a data distribution within Wasserstein geometry. It shows how to learn these velocity fields through flow-matching losses and demonstrates applications to Bayesian inverse problems via conditional Wasserstein distances and partial constraints on couplings. The paper also situates flow matching relative to continuous normalizing flows and score-based diffusion, highlighting practical algorithmic schemes and numerical demonstrations. The resulting framework offers a principled route to sampling and posterior inference by evolving probability measures along AC curves rather than learning a single transport map. Overall, it provides both theoretical foundations and practical tools for scalable, geometry-aware generative modeling and inverse problems.

Abstract

Among generative neural models, flow matching techniques stand out for their simple applicability and good scaling properties. Here, velocity fields of curves connecting a simple latent and a target distribution are learned. Then the corresponding ordinary differential equation can be used to sample from a target distribution, starting in samples from the latent one. This paper reviews from a mathematical point of view different techniques to learn the velocity fields of absolutely continuous curves in the Wasserstein geometry. We show how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. Besides this main goal, we show how flow matching can be used for solving Bayesian inverse problems, where the definition of conditional Wasserstein distances plays a central role. Finally, we briefly address continuous normalizing flows and score matching techniques, which approach the learning of velocity fields of curves from other directions.

Flow Matching: Markov Kernels, Stochastic Processes and Transport Plans

TL;DR

This work surveys flow matching from a rigorous mathematical standpoint, unifying three perspectives—couplings, Markov kernels, and stochastic processes—for constructing velocity fields that transport from a latent to a data distribution within Wasserstein geometry. It shows how to learn these velocity fields through flow-matching losses and demonstrates applications to Bayesian inverse problems via conditional Wasserstein distances and partial constraints on couplings. The paper also situates flow matching relative to continuous normalizing flows and score-based diffusion, highlighting practical algorithmic schemes and numerical demonstrations. The resulting framework offers a principled route to sampling and posterior inference by evolving probability measures along AC curves rather than learning a single transport map. Overall, it provides both theoretical foundations and practical tools for scalable, geometry-aware generative modeling and inverse problems.

Abstract

Among generative neural models, flow matching techniques stand out for their simple applicability and good scaling properties. Here, velocity fields of curves connecting a simple latent and a target distribution are learned. Then the corresponding ordinary differential equation can be used to sample from a target distribution, starting in samples from the latent one. This paper reviews from a mathematical point of view different techniques to learn the velocity fields of absolutely continuous curves in the Wasserstein geometry. We show how the velocity fields can be characterized and learned via i) transport plans (couplings) between latent and target distributions, ii) Markov kernels and iii) stochastic processes, where the latter two include the coupling approach, but are in general broader. Besides this main goal, we show how flow matching can be used for solving Bayesian inverse problems, where the definition of conditional Wasserstein distances plays a central role. Finally, we briefly address continuous normalizing flows and score matching techniques, which approach the learning of velocity fields of curves from other directions.

Paper Structure

This paper contains 34 sections, 36 theorems, 249 equations, 13 figures, 2 algorithms.

Key Result

Theorem 2.6

Let $\mu \in \mathcal{P}_2(\mathbb{R}^d)$ be absolutely continuous and $\nu \in \mathcal{P}_2(\mathbb{R}^d)$. Then there is a unique plan $\alpha \in \Gamma_o(\mu, \nu)$, which is induced by a unique measurable optimal transport map, also called Monge map, $T\colon \mathbb{R}^d \to \mathbb{R}^d$, i. and Further, $T = \nabla \psi$, where $\psi\colon \mathbb{R}^d \to (-\infty,+\infty]$ is convex, l

Figures (13)

  • Figure 1: Illustration of a curve from the standard Gaussian distribution to a Gaussian mixture in one dimension. The plot shows $t \mapsto \phi(t,x^i)$, $i=1,2$ for two different samples $x^i$ (black, green), the vectors $(1, \partial_t \phi)$ and the red-blue color-coded velocity field $\partial_t \phi$. Courtesy: Blogpost sego
  • Figure 2: Disintegration of the measure $\alpha \in \mathcal{P}(\mathbb{R} \times \mathbb{R})$ (left). Measures $\alpha^{-0.3} \in \mathcal{P}(\mathbb{R})$ (middle, green) and $\alpha^{0.2}\in \mathcal{P}(\mathbb{R})$ (right, red).
  • Figure 3: Plan/Coupling of two discrete measures $\mu$ and $\nu$ (left) and Markov kernel/disintegration (right) with row and column sums.
  • Figure 4: Curve induced by $\alpha=(\mathop{\mathrm{Id}}\nolimits,T)_\sharp\mu_0$ from $\mu_0 = \delta_{x_0}$, resp. $\mu_0 = \tfrac12 (\delta_{x_0} + \delta_{x_1})$ to $\mu_1 = \tfrac12 (\delta_{y_0} + \delta_{y_1})$. In (c), at the crossing time $s$ of the path, there does not exist a map $T_s$ that induces an element in $\Gamma_o(\mu_s,\mu_1)$. Red arrows: vector fields computed via \ref{['eq:speed_1']}
  • Figure 5: Illustration to Example \ref{['example:not_tangential']}. Both vector fields generate the same curves $\mu_t$ but in $\rm{(d)}$ mass is rotated unnecessarily, which makes the trajectories of single particles longer than for the vector field in $\rm{(c)}$.
  • ...and 8 more figures

Theorems & Definitions (84)

  • Remark 2.1: Test functions
  • Remark 2.2: Random variables
  • Remark 2.3: Kullback-Leibler divergence
  • Remark 2.4: Random variables
  • Example 2.5
  • Theorem 2.6: Brenier
  • Example 2.7
  • Theorem 3.1
  • Remark 3.2
  • Theorem 3.3
  • ...and 74 more