Table of Contents
Fetching ...

Towards Hierarchical Rectified Flow

Yichi Zhang, Yici Yan, Alex Schwing, Zhizhen Zhao

TL;DR

This work addresses the limitation of classic rectified flow in modeling multimodal velocity fields by introducing Hierarchical Rectified Flow (HRF), which learns acceleration in velocity space through hierarchically coupled ODEs to capture the full velocity distribution. Sampling proceeds via a two‑stage process: first draw a velocity sample from the learned $\pi_1(v; x_t,t)$ through forward acceleration and then update the location, yielding straighter, potentially intersecting paths and reducing neural function evaluations. The authors derive analytical forms for velocity distributions in Gaussian mixtures, propose an acceleration‑matching training objective, and extend the framework to depth $D$ to capture higher‑order dynamics. Empirical results on synthetic 1D/2D data and real image datasets (MNIST, CIFAR‑10, ImageNet‑32) show improved data fit (lower WD/SWD/FID) at similar NFEs, with HRF2 often outperforming the baseline RF, and code is released for reproducibility.

Abstract

We formulate a hierarchical rectified flow to model data distributions. It hierarchically couples multiple ordinary differential equations (ODEs) and defines a time-differentiable stochastic process that generates a data distribution from a known source distribution. Each ODE resembles the ODE that is solved in a classic rectified flow, but differs in its domain, i.e., location, velocity, acceleration, etc. Unlike the classic rectified flow formulation, which formulates a single ODE in the location domain and only captures the expected velocity field (sufficient to capture a multi-modal data distribution), the hierarchical rectified flow formulation models the multi-modal random velocity field, acceleration field, etc., in their entirety. This more faithful modeling of the random velocity field enables integration paths to intersect when the underlying ODE is solved during data generation. Intersecting paths in turn lead to integration trajectories that are more straight than those obtained in the classic rectified flow formulation, where integration paths cannot intersect. This leads to modeling of data distributions with fewer neural function evaluations. We empirically verify this on synthetic 1D and 2D data as well as MNIST, CIFAR-10, and ImageNet-32 data. Our code is available at: https://riccizz.github.io/HRF/.

Towards Hierarchical Rectified Flow

TL;DR

This work addresses the limitation of classic rectified flow in modeling multimodal velocity fields by introducing Hierarchical Rectified Flow (HRF), which learns acceleration in velocity space through hierarchically coupled ODEs to capture the full velocity distribution. Sampling proceeds via a two‑stage process: first draw a velocity sample from the learned through forward acceleration and then update the location, yielding straighter, potentially intersecting paths and reducing neural function evaluations. The authors derive analytical forms for velocity distributions in Gaussian mixtures, propose an acceleration‑matching training objective, and extend the framework to depth to capture higher‑order dynamics. Empirical results on synthetic 1D/2D data and real image datasets (MNIST, CIFAR‑10, ImageNet‑32) show improved data fit (lower WD/SWD/FID) at similar NFEs, with HRF2 often outperforming the baseline RF, and code is released for reproducibility.

Abstract

We formulate a hierarchical rectified flow to model data distributions. It hierarchically couples multiple ordinary differential equations (ODEs) and defines a time-differentiable stochastic process that generates a data distribution from a known source distribution. Each ODE resembles the ODE that is solved in a classic rectified flow, but differs in its domain, i.e., location, velocity, acceleration, etc. Unlike the classic rectified flow formulation, which formulates a single ODE in the location domain and only captures the expected velocity field (sufficient to capture a multi-modal data distribution), the hierarchical rectified flow formulation models the multi-modal random velocity field, acceleration field, etc., in their entirety. This more faithful modeling of the random velocity field enables integration paths to intersect when the underlying ODE is solved during data generation. Intersecting paths in turn lead to integration trajectories that are more straight than those obtained in the classic rectified flow formulation, where integration paths cannot intersect. This leads to modeling of data distributions with fewer neural function evaluations. We empirically verify this on synthetic 1D and 2D data as well as MNIST, CIFAR-10, and ImageNet-32 data. Our code is available at: https://riccizz.github.io/HRF/.

Paper Structure

This paper contains 28 sections, 6 theorems, 29 equations, 11 figures, 7 tables, 4 algorithms.

Key Result

Theorem 1

The velocity distribution $\pi_1(v; x_t, t)$ at the space time location $(x_t, t)$ induced by the linear interpolation in eq:lin_int is for $\rho_t(x_t) \neq 0$ with ('*' denotes convolution) The distribution $\pi_1(v; x_t, t)$ is undefined if $\rho_t(x_t) =0$.

Figures (11)

  • Figure 1: Particles flow from starting points (grey) to endpoints (blue) as time increases from $0$ to $1$. Ideally, the trajectories (green) are straight lines connecting two ends as shown in (a). Rectified Flow captures the expected velocity field while our Hierarchical Rectified Flow can model the true velocity field thus generating intersecting and more straight paths.
  • Figure 2: We verify the derived velocity distribution by comparing its probability density function (blue) to the empirical sample histogram (orange) at different times $t$ and locations $x_t$.
  • Figure 3: Numerical estimation of $\pi_1(x_t, t)$ in HRF2 with different number of $v$ integration steps. The blue line shows the ground-truth $\pi_1$, where $\rho_0$ is a standard Gaussian and $\rho_1$ is a mixture of two Gaussians. The 1-Wasserstein distances (WD) for the estimates w.r.t. $\pi_1$ are shown in the legend.
  • Figure 4: Results on 1D example, where $\rho_0$ is a standard Gaussian and $\rho_1$ is a mixture of 5 Gaussians. (a) Histograms of generated samples and $\rho_1$. (b) The 1-Wasserstein distance vs. NFE. (c) and (d) The trajectories of particles flowing from source distribution (grey) to target distribution (blue).
  • Figure 5: Results on 2D data. Top row: $\rho_0$ is a standard Gaussian and $\rho_1$ is a mixture of 6 Gaussians. Bottom row: $\rho_0$ is a mixture 8 Gaussians and $\rho_1$ is represented by the moons data. (a) Sliced 2-Wasserstein distance with respect to NFE. (b) and (c) show the trajectories (green) of sample particles flowing from source distribution (grey) to target distribution (blue).
  • ...and 6 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Lemma 1
  • Lemma 2
  • Lemma 3