Table of Contents
Fetching ...

Flow matching achieves almost minimax optimal convergence

Kenji Fukumizu, Taiji Suzuki, Noboru Isobe, Kazusato Oko, Masanori Koyama

TL;DR

This paper analyzes Flow Matching (FM), a simulation-free generative modeling approach that learns a time-dependent vector field and generates samples by integrating an ODE from a standard normal initial condition. It proves that, for targets in Besov spaces $B^s_{p',q'}$ with $1\le p\le 2$, FM achieves an almost minimax convergence rate in the $W_p$ metric as the training size $n$ grows, specifically ${\mathbb E}[W_p(\widehat P_{[1-T_0]},P_{true})] = O\left(n^{-(s+(2\kappa)^{-1}-\delta)/(2s+d)}\right)$, under variance-decay $\sigma_t\sim t^{\kappa}$ with $\kappa\ge 1/2$ and an early-stopped ODE. The paper extends prior diffusion-model analyses to a broader FM setting, reveals the critical role of the Gaussian conditional kernel’s variance decay, and introduces a time-partitioning neural-network scheme to obtain near-optimal rates. It also discusses practical implications, such as the KDE-like behavior that motivates stopping before $ au=1$, and outlines future directions for removing time-splitting and extending theoretical guarantees beyond the current Gaussian-path framework.

Abstract

Flow matching (FM) has gained significant attention as a simulation-free generative model. Unlike diffusion models, which are based on stochastic differential equations, FM employs a simpler approach by solving an ordinary differential equation with an initial condition from a normal distribution, thus streamlining the sample generation process. This paper discusses the convergence properties of FM for large sample size under the $p$-Wasserstein distance, a measure of distributional discrepancy. We establish that FM can achieve an almost minimax optimal convergence rate for $1 \leq p \leq 2$, presenting the first theoretical evidence that FM can reach convergence rates comparable to those of diffusion models. Our analysis extends existing frameworks by examining a broader class of mean and variance functions for the vector fields and identifies specific conditions necessary to attain almost optimal rates.

Flow matching achieves almost minimax optimal convergence

TL;DR

This paper analyzes Flow Matching (FM), a simulation-free generative modeling approach that learns a time-dependent vector field and generates samples by integrating an ODE from a standard normal initial condition. It proves that, for targets in Besov spaces with , FM achieves an almost minimax convergence rate in the metric as the training size grows, specifically , under variance-decay with and an early-stopped ODE. The paper extends prior diffusion-model analyses to a broader FM setting, reveals the critical role of the Gaussian conditional kernel’s variance decay, and introduces a time-partitioning neural-network scheme to obtain near-optimal rates. It also discusses practical implications, such as the KDE-like behavior that motivates stopping before , and outlines future directions for removing time-splitting and extending theoretical guarantees beyond the current Gaussian-path framework.

Abstract

Flow matching (FM) has gained significant attention as a simulation-free generative model. Unlike diffusion models, which are based on stochastic differential equations, FM employs a simpler approach by solving an ordinary differential equation with an initial condition from a normal distribution, thus streamlining the sample generation process. This paper discusses the convergence properties of FM for large sample size under the -Wasserstein distance, a measure of distributional discrepancy. We establish that FM can achieve an almost minimax optimal convergence rate for , presenting the first theoretical evidence that FM can reach convergence rates comparable to those of diffusion models. Our analysis extends existing frameworks by examining a broader class of mean and variance functions for the vector fields and identifies specific conditions necessary to attain almost optimal rates.
Paper Structure (36 sections, 23 theorems, 174 equations, 1 figure)

This paper contains 36 sections, 23 theorems, 174 equations, 1 figure.

Key Result

Theorem 1

Suppose that the target p.d.f. $p_{[1]}$ in the Besov space$B^{s}_{p',q'}([-1,1]^d)$ of smoothness degree $s$, and that $n$ training data $\{x^{(i)}\}_{i=1}^n$ is an i.i.d. sample from $P_{[1]}$. Assume that $\sigma_{[\tau]} \sim {(1-\tau)}^{\kappa}$ ($\tau\to 1^-$) with $\kappa \geq 1/2$, the condi where $\mathbb{E}$ denotes the expectation over the training data.

Figures (1)

  • Figure 1: Division of the cube into smoother small regions and the general region.

Theorems & Definitions (34)

  • Theorem 1: Informal
  • Proposition 2: Niles-Weed2022-fz
  • Theorem 3
  • Theorem 4
  • Lemma 5
  • Lemma 6
  • Theorem 7
  • Theorem 8
  • Theorem 9: Main result
  • proof : Proof Sketch
  • ...and 24 more