Flow matching achieves almost minimax optimal convergence
Kenji Fukumizu, Taiji Suzuki, Noboru Isobe, Kazusato Oko, Masanori Koyama
TL;DR
This paper analyzes Flow Matching (FM), a simulation-free generative modeling approach that learns a time-dependent vector field and generates samples by integrating an ODE from a standard normal initial condition. It proves that, for targets in Besov spaces $B^s_{p',q'}$ with $1\le p\le 2$, FM achieves an almost minimax convergence rate in the $W_p$ metric as the training size $n$ grows, specifically ${\mathbb E}[W_p(\widehat P_{[1-T_0]},P_{true})] = O\left(n^{-(s+(2\kappa)^{-1}-\delta)/(2s+d)}\right)$, under variance-decay $\sigma_t\sim t^{\kappa}$ with $\kappa\ge 1/2$ and an early-stopped ODE. The paper extends prior diffusion-model analyses to a broader FM setting, reveals the critical role of the Gaussian conditional kernel’s variance decay, and introduces a time-partitioning neural-network scheme to obtain near-optimal rates. It also discusses practical implications, such as the KDE-like behavior that motivates stopping before $ au=1$, and outlines future directions for removing time-splitting and extending theoretical guarantees beyond the current Gaussian-path framework.
Abstract
Flow matching (FM) has gained significant attention as a simulation-free generative model. Unlike diffusion models, which are based on stochastic differential equations, FM employs a simpler approach by solving an ordinary differential equation with an initial condition from a normal distribution, thus streamlining the sample generation process. This paper discusses the convergence properties of FM for large sample size under the $p$-Wasserstein distance, a measure of distributional discrepancy. We establish that FM can achieve an almost minimax optimal convergence rate for $1 \leq p \leq 2$, presenting the first theoretical evidence that FM can reach convergence rates comparable to those of diffusion models. Our analysis extends existing frameworks by examining a broader class of mean and variance functions for the vector fields and identifies specific conditions necessary to attain almost optimal rates.
