Flow matching achieves almost minimax optimal convergence

Kenji Fukumizu; Taiji Suzuki; Noboru Isobe; Kazusato Oko; Masanori Koyama

Flow matching achieves almost minimax optimal convergence

Kenji Fukumizu, Taiji Suzuki, Noboru Isobe, Kazusato Oko, Masanori Koyama

TL;DR

This paper analyzes Flow Matching (FM), a simulation-free generative modeling approach that learns a time-dependent vector field and generates samples by integrating an ODE from a standard normal initial condition. It proves that, for targets in Besov spaces $B^s_{p',q'}$ with $1\le p\le 2$, FM achieves an almost minimax convergence rate in the $W_p$ metric as the training size $n$ grows, specifically ${\mathbb E}[W_p(\widehat P_{[1-T_0]},P_{true})] = O\left(n^{-(s+(2\kappa)^{-1}-\delta)/(2s+d)}\right)$, under variance-decay $\sigma_t\sim t^{\kappa}$ with $\kappa\ge 1/2$ and an early-stopped ODE. The paper extends prior diffusion-model analyses to a broader FM setting, reveals the critical role of the Gaussian conditional kernel’s variance decay, and introduces a time-partitioning neural-network scheme to obtain near-optimal rates. It also discusses practical implications, such as the KDE-like behavior that motivates stopping before $ au=1$, and outlines future directions for removing time-splitting and extending theoretical guarantees beyond the current Gaussian-path framework.

Abstract

Flow matching (FM) has gained significant attention as a simulation-free generative model. Unlike diffusion models, which are based on stochastic differential equations, FM employs a simpler approach by solving an ordinary differential equation with an initial condition from a normal distribution, thus streamlining the sample generation process. This paper discusses the convergence properties of FM for large sample size under the $p$-Wasserstein distance, a measure of distributional discrepancy. We establish that FM can achieve an almost minimax optimal convergence rate for $1 \leq p \leq 2$, presenting the first theoretical evidence that FM can reach convergence rates comparable to those of diffusion models. Our analysis extends existing frameworks by examining a broader class of mean and variance functions for the vector fields and identifies specific conditions necessary to attain almost optimal rates.

Flow matching achieves almost minimax optimal convergence

TL;DR

with

, FM achieves an almost minimax convergence rate in the

metric as the training size

grows, specifically

, under variance-decay

with

and an early-stopped ODE. The paper extends prior diffusion-model analyses to a broader FM setting, reveals the critical role of the Gaussian conditional kernel’s variance decay, and introduces a time-partitioning neural-network scheme to obtain near-optimal rates. It also discusses practical implications, such as the KDE-like behavior that motivates stopping before

, and outlines future directions for removing time-splitting and extending theoretical guarantees beyond the current Gaussian-path framework.

Abstract

-Wasserstein distance, a measure of distributional discrepancy. We establish that FM can achieve an almost minimax optimal convergence rate for

, presenting the first theoretical evidence that FM can reach convergence rates comparable to those of diffusion models. Our analysis extends existing frameworks by examining a broader class of mean and variance functions for the vector fields and identifies specific conditions necessary to attain almost optimal rates.

Paper Structure (36 sections, 23 theorems, 174 equations, 1 figure)

This paper contains 36 sections, 23 theorems, 174 equations, 1 figure.

Introduction
Flow matching
Review of flow matching
Path construction
Convergence rate of flow matching
Kernel density estimation and early stopping of ODE
Related works
Theoretical details
Problem Setting
Assumptions
Generalization bound
Complexity term in generalization bound
Approximation error for small $t$
Approximation error for large $t$
Convergence rate under Wasserstein distance
...and 21 more sections

Key Result

Theorem 1

Suppose that the target p.d.f. $p_{[1]}$ in the Besov space$B^{s}_{p',q'}([-1,1]^d)$ of smoothness degree $s$, and that $n$ training data $\{x^{(i)}\}_{i=1}^n$ is an i.i.d. sample from $P_{[1]}$. Assume that $\sigma_{[\tau]} \sim {(1-\tau)}^{\kappa}$ ($\tau\to 1^-$) with $\kappa \geq 1/2$, the condi where $\mathbb{E}$ denotes the expectation over the training data.

Figures (1)

Figure 1: Division of the cube into smoother small regions and the general region.

Theorems & Definitions (34)

Theorem 1: Informal
Proposition 2: Niles-Weed2022-fz
Theorem 3
Theorem 4
Lemma 5
Lemma 6
Theorem 7
Theorem 8
Theorem 9: Main result
proof : Proof Sketch
...and 24 more

Flow matching achieves almost minimax optimal convergence

TL;DR

Abstract

Flow matching achieves almost minimax optimal convergence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (34)