Convergence of Continuous Normalizing Flows for Learning Probability Distributions

Yuan Gao; Jian Huang; Yuling Jiao; Shurong Zheng

Convergence of Continuous Normalizing Flows for Learning Probability Distributions

Yuan Gao, Jian Huang, Yuling Jiao, Shurong Zheng

TL;DR

This work establishes non-asymptotic Wasserstein-2 convergence guarantees for simulation-free continuous normalizing flows trained via flow matching. By focusing on CNFs with linear interpolation, it proves Lipschitz regularity of the velocity fields and develops $L^{\infty}$-type neural network approximation bounds that preserve Lipschitz properties. The analysis decomposes end-to-end distribution errors into discretization, velocity-estimation, and early-stopping components, and derives a nearly minimax nonparametric rate $\widetilde{\mathcal{O}}(n^{-1/(d+5)})$ for learning distributions from finite samples, up to polylog factors. The results also connect to broader themes in diffusion models and neural-network approximation theory, yielding practical guarantees for CNF-based distribution learning in high dimensions while highlighting key trade-offs arising from time-singular behavior near the terminal time.

Abstract

Continuous normalizing flows (CNFs) are a generative method for learning probability distributions, which is based on ordinary differential equations. This method has shown remarkable empirical success across various applications, including large-scale image synthesis, protein structure prediction, and molecule generation. In this work, we study the theoretical properties of CNFs with linear interpolation in learning probability distributions from a finite random sample, using a flow matching objective function. We establish non-asymptotic error bounds for the distribution estimator based on CNFs, in terms of the Wasserstein-2 distance. The key assumption in our analysis is that the target distribution satisfies one of the following three conditions: it either has a bounded support, is strongly log-concave, or is a finite or infinite mixture of Gaussian distributions. We present a convergence analysis framework that encompasses the error due to velocity estimation, the discretization error, and the early stopping error. A key step in our analysis involves establishing the regularity properties of the velocity field and its estimator for CNFs constructed with linear interpolation. This necessitates the development of uniform error bounds with Lipschitz regularity control of deep ReLU networks that approximate the Lipschitz function class, which could be of independent interest. Our nonparametric convergence analysis offers theoretical guarantees for using CNFs to learn probability distributions from a finite random sample.

Convergence of Continuous Normalizing Flows for Learning Probability Distributions

TL;DR

-type neural network approximation bounds that preserve Lipschitz properties. The analysis decomposes end-to-end distribution errors into discretization, velocity-estimation, and early-stopping components, and derives a nearly minimax nonparametric rate

for learning distributions from finite samples, up to polylog factors. The results also connect to broader themes in diffusion models and neural-network approximation theory, yielding practical guarantees for CNF-based distribution learning in high dimensions while highlighting key trade-offs arising from time-singular behavior near the terminal time.

Abstract

Paper Structure (45 sections, 53 theorems, 259 equations, 3 figures, 3 tables)

This paper contains 45 sections, 53 theorems, 259 equations, 3 figures, 3 tables.

Introduction
Preview of main results
Our contributions
Preliminaries
Definitions
Assumptions
Simulation-free continuous normalizing flows
Construction of simulation-free CNFs
Flow matching
Forward Euler discretization
Main result: Error bounds for distribution estimation
Error decomposition
Error bounds for the estimated distribution
Error analysis of flow matching
Regularity of velocity fields
...and 30 more sections

Key Result

Theorem 1.1

Assume that the target distribution either has a bounded support, is strongly log-concave, or is a mixture of Gaussians. Let $0 < \underline{t} \ll 1$. The velocity fields of the CNFs with linear interpolation have the following regularity properties:

Figures (3)

Figure 1: Functions $g_1$ and $g_2$ for defining a partition of unity.
Figure 2: Functions $\phi_m(t)$ and $\phi_{m+1}(t)$ for defining a partition of unity.
Figure 3: The clipping function $\beta_A$.

Theorems & Definitions (114)

Theorem 1.1: Informal
Remark 1
Theorem 1.2: Informal
Remark 2
Definition 2.1: cattiaux2014semi
Definition 2.2: eldan2018regularization
Definition 2.3: Deep ReLU networks
Definition 2.4: Wasserstein-$2$ distance
Remark 3: Distribution classes
Remark 4: Score Lipschitzness
...and 104 more

Convergence of Continuous Normalizing Flows for Learning Probability Distributions

TL;DR

Abstract

Convergence of Continuous Normalizing Flows for Learning Probability Distributions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (114)