Table of Contents
Fetching ...

On Flow Matching KL Divergence

Maojiang Su, Jerry Yao-Chieh Hu, Sophia Pi, Han Liu

TL;DR

The paper addresses the distributional error of flow matching by deriving a non-asymptotic KL bound that links the training loss to the divergence between the true data distribution and the FM estimate. By establishing the KL Evolution Identity and applying Grönwall’s inequality, it proves KL(p1||q1) ≤ A1 ε + A2 ε^2 and then translates this into TV convergence rates for Flow Matching Transformers under Hölder smoothness, including near-minimax optimality. The work complements existing diffusion-model analyses by providing a direct information-theoretic control of distributional error in a deterministic, ODE-based setting, and supports its theory with comprehensive numerical validations on synthetic and learned velocities. Overall, the paper substantiates the statistical efficiency of flow matching in terms of KL and TV metrics and clarifies the role of regularity assumptions in guaranteeing meaningful distributional guarantees.

Abstract

We derive a deterministic, non-asymptotic upper bound on the Kullback-Leibler (KL) divergence of the flow-matching distribution approximation. In particular, if the $L_2$ flow-matching loss is bounded by $ε^2 > 0$, then the KL divergence between the true data distribution and the estimated distribution is bounded by $A_1 ε+ A_2 ε^2$. Here, the constants $A_1$ and $A_2$ depend only on the regularities of the data and velocity fields. Consequently, this bound implies statistical convergence rates of Flow Matching Transformers under the Total Variation (TV) distance. We show that, flow matching achieves nearly minimax-optimal efficiency in estimating smooth distributions. Our results make the statistical efficiency of flow matching comparable to that of diffusion models under the TV distance. Numerical studies on synthetic and learned velocities corroborate our theory.

On Flow Matching KL Divergence

TL;DR

The paper addresses the distributional error of flow matching by deriving a non-asymptotic KL bound that links the training loss to the divergence between the true data distribution and the FM estimate. By establishing the KL Evolution Identity and applying Grönwall’s inequality, it proves KL(p1||q1) ≤ A1 ε + A2 ε^2 and then translates this into TV convergence rates for Flow Matching Transformers under Hölder smoothness, including near-minimax optimality. The work complements existing diffusion-model analyses by providing a direct information-theoretic control of distributional error in a deterministic, ODE-based setting, and supports its theory with comprehensive numerical validations on synthetic and learned velocities. Overall, the paper substantiates the statistical efficiency of flow matching in terms of KL and TV metrics and clarifies the role of regularity assumptions in guaranteeing meaningful distributional guarantees.

Abstract

We derive a deterministic, non-asymptotic upper bound on the Kullback-Leibler (KL) divergence of the flow-matching distribution approximation. In particular, if the flow-matching loss is bounded by , then the KL divergence between the true data distribution and the estimated distribution is bounded by . Here, the constants and depend only on the regularities of the data and velocity fields. Consequently, this bound implies statistical convergence rates of Flow Matching Transformers under the Total Variation (TV) distance. We show that, flow matching achieves nearly minimax-optimal efficiency in estimating smooth distributions. Our results make the statistical efficiency of flow matching comparable to that of diffusion models under the TV distance. Numerical studies on synthetic and learned velocities corroborate our theory.

Paper Structure

This paper contains 27 sections, 14 theorems, 86 equations, 4 figures.

Key Result

Lemma 3.1

Let two velocities field $u(x,t),v(x,t) \in C([0,1];(C^1(\mathbb{R}^d))^d)$. Let $p_t$ and $q_t$ be two paths of differentiable probability densities on $\mathbb{R}^d$ evolving under the continuity equations with same initial distribution $p_0 = q_0$. Then for all $t \in [0,1]$,

Figures (4)

  • Figure 1: Closed-Form KL Identity (\ref{['lem:kl_identity']}) Verification without Learning. Here $p_t$ evolves under $a_1(t)=\sin(\pi t)$ while $q_t$ evolves under $a_3(t)=t-\tfrac{1}{2}$.
  • Figure 2: KL Identity (\ref{['lem:kl_identity']}) Verification with Learned Velocity Field $c_\theta$. The model is trained on $a_2(t)=0.3\sin(2\pi t)+0.2$ until validation MSE $\le 0.05$. Sampling from $p_t$ (also under $a_2$), we compare the empirical KL divergence (dark grey) with the integrated RHS (dark red, dashed).
  • Figure 3: Closed-Form KL Error Bound (\ref{['thm:kl_bound']}) Verification. Uses schedule $a_3$ with constant perturbations. Line plot showing $\mathrm{KL}(p_1 \Vert q_1)$ versus $\epsilon \sqrt{S}$ for synthetic velocity fields $v(x,t)=\bigl(a_3(t)+\delta(t)\bigr)x$ with $\delta(t)=\beta$, $\beta\in\{0,0.025,\dots,0.2\}$. Each point represents one perturbation configuration.
  • Figure 4: KL Error Bound (\ref{['thm:kl_bound']}) Verification with Learned Velocity Field. Uses schedule $a_1$. Line plot showing $\mathrm{KL}(p_1 \Vert q_1^\theta)$ (dark grey) and $\epsilon_\theta \sqrt{S_\theta}$ (dark red) versus $\epsilon_\theta$ (RMS flow-matching loss) on log-log axes for multiple checkpoints during training. Each point represents a checkpoint at different training stages.

Theorems & Definitions (27)

  • Lemma 3.1: KL Evolution Identity for Continuity Flows; Lemma 21 of albergo2023stochastic
  • proof
  • Lemma 3.2: KL Difference for Continuity Flows
  • proof
  • Lemma 3.3: Grönwall's Inequality; gronwall1919note
  • Theorem 3.1: Flow Matching KL Error Bounds
  • proof : Proof Sketch
  • Definition 4.1: Hölder Space
  • Lemma 4.1: Velocity Estimation with Transformer; Theorem 4.2 of su2025high
  • Theorem 4.1: Convergence Rate under Total Variation Distance
  • ...and 17 more