Table of Contents
Fetching ...

Statistical Properties of Rectified Flow

Gonzalo Mena, Arun Kumar Kuchibhotla, Larry Wasserman

TL;DR

This work develops a rigorous statistical theory for rectified flow, a velocity-field–driven method to construct transport maps between distributions. It introduces multiple representations of the velocity under independence coupling and derives four estimation approaches (density-based, regression-based, substitutions, and semiparametric), along with smoothing variants, to estimate the velocity and the resulting rectified map. The authors establish existence, regularity, and convergence rates for the rectified flow in both unbounded and bounded domains, including central limit theorems and perturbation-based linearization results that quantify how estimation error propagates through the ODE. Through explicit examples, E2E analysis, and numerical experiments, the paper demonstrates that rectified flow can achieve faster rates than standard nonparametric regression or density estimation and provides practical, scalable tools for transport-map estimation without solving large variational problems. Overall, the work bridges statistical regression tools and dynamical transport, offering theoretically grounded, implementable methods for approximate transport maps with quantifiable uncertainty.

Abstract

Rectified flow (Liu et al., 2022; Liu, 2022; Wu et al., 2023) is a method for defining a transport map between two distributions, and enjoys popularity in machine learning, although theoretical results supporting the validity of these methods are scant. The rectified flow can be regarded as an approximation to optimal transport, but in contrast to other transport methods that require optimization over a function space, computing the rectified flow only requires standard statistical tools such as regression or density estimation, which we leverage to develop empirical versions of transport maps. We study some structural properties of the rectified flow, including existence, uniqueness, and regularity, as well as the related statistical properties, such as rates of convergence and central limit theorems, for some selected estimators. To do so, we analyze the bounded and unbounded cases separately as each presents unique challenges. In both cases, we are able to establish convergence at faster rates than those for the usual nonparametric regression and density estimation.

Statistical Properties of Rectified Flow

TL;DR

This work develops a rigorous statistical theory for rectified flow, a velocity-field–driven method to construct transport maps between distributions. It introduces multiple representations of the velocity under independence coupling and derives four estimation approaches (density-based, regression-based, substitutions, and semiparametric), along with smoothing variants, to estimate the velocity and the resulting rectified map. The authors establish existence, regularity, and convergence rates for the rectified flow in both unbounded and bounded domains, including central limit theorems and perturbation-based linearization results that quantify how estimation error propagates through the ODE. Through explicit examples, E2E analysis, and numerical experiments, the paper demonstrates that rectified flow can achieve faster rates than standard nonparametric regression or density estimation and provides practical, scalable tools for transport-map estimation without solving large variational problems. Overall, the work bridges statistical regression tools and dynamical transport, offering theoretically grounded, implementable methods for approximate transport maps with quantifiable uncertainty.

Abstract

Rectified flow (Liu et al., 2022; Liu, 2022; Wu et al., 2023) is a method for defining a transport map between two distributions, and enjoys popularity in machine learning, although theoretical results supporting the validity of these methods are scant. The rectified flow can be regarded as an approximation to optimal transport, but in contrast to other transport methods that require optimization over a function space, computing the rectified flow only requires standard statistical tools such as regression or density estimation, which we leverage to develop empirical versions of transport maps. We study some structural properties of the rectified flow, including existence, uniqueness, and regularity, as well as the related statistical properties, such as rates of convergence and central limit theorems, for some selected estimators. To do so, we analyze the bounded and unbounded cases separately as each presents unique challenges. In both cases, we are able to establish convergence at faster rates than those for the usual nonparametric regression and density estimation.

Paper Structure

This paper contains 72 sections, 64 theorems, 716 equations, 6 figures.

Key Result

Lemma 1

Assuming $\mu_0$ and $\mu_1$ have Lebesgue densities $p_0$ and $p_1$, the velocity field can be equivalently written as follows: where $\mathbb{E}_0[\cdot]$ and $\mathbb{E}_1[\cdot]$ represent expectations with respect to $\mu_0$ and $\mu_1$, respectively, and

Figures (6)

  • Figure 1: The figure on the left shows the velocity for the rectified flow when $z = 1$ (solid), $z = 1/2$ (dotted), and $z = 1/4$ (dashed). The figure on the right shows the paths $z(t)$ when $\mu_0 = \mu_1 = N(0,1)$. The resulting map $R(x)$ is the identity map but the paths are nonlinear. The optimal transport path is simply constant.
  • Figure 2: Plot of the rectified flow map $x\mapsto R(x)$ in transporting $X_0\sim 0.5 N(1, 1) + 0.5 N(-1, 1)$ to itself.
  • Figure 3: A plot of $v_t(z)$ versus $z$ for four values of $t$ in the case where $\mu_0 = \mu_1 = \mathrm{Unif}[0,1]$. We see that $v_t(z)$ is piecewise smooth. As $t$ approaches 0 and 1, the Lipschitz constant approaches infinity near the boundary.
  • Figure 4: Estimating trajectories $z_t(x)$ and $R(x)=z_1(x)$ in the one-dimensional case, estimating the velocity with kernel regression. Each plot shows results for a different choice of bandwidth parameter. We show true and estimated trajectories $z_t(x)=z(t,0,x)$ as a function of different starting points $x$, where $X_0,X_1$ are independent standard Gaussians. Estimators are based on $n=200$ samples. Ground true trajectories are shown in black, and colored solid lines show mean trajectories over 1000 experiment repetitions, for each starting point (different colors). Shades represent 95% empirical intervals from these repetitions. Additionally, dashed colored lines show one selected sample from the kernel regression-based estimator.
  • Figure 5: Performance of four rectified estimators in the Gaussian case. We drew $n=100$ samples from $X_1,X_0\sim N\left(0,I_d\right)$ with $d=50$. In plot $(i,j)$ we show $\widehat{R}_i(0,\ldots,x_j,\ldots,0)$ as a function of $x_j\in[-3,3]$. We show only the first 36=6x6 described above. functions. We consider the plug-in estimator (purple), linear regression (orange), cross-validated Lasso (blue), and kernel regression (green). Dashed black lines represent truth $R_i(0,\ldots,x_j,\ldots,0)=x_j \delta_{i=j}$. In all cases, we used a naive ODE discretization by dividing the $[0,1]$ interval into $T=50$ steps.
  • ...and 1 more figures

Theorems & Definitions (106)

  • Lemma 1
  • Theorem 1
  • Lemma 2
  • Proposition 1
  • Remark
  • Remark
  • Lemma 3
  • Proposition 2
  • Remark
  • Theorem 2
  • ...and 96 more