Table of Contents
Fetching ...

Variational Rectified Flow Matching

Pengsheng Guo, Alexander G. Schwing

TL;DR

This paper tackles the limitation of classic rectified flow matching, which struggles to capture multi-modal velocity directions during transport from a source to a target distribution. It introduces Variational Rectified Flow Matching (VRFM), which conditions the velocity field on a latent variable $z$ and trains via a variational lower bound over $z$, effectively modeling a Gaussian mixture of velocities $p(v|x_t,t,z)$ and preserving the marginal data distribution. The approach yields flows that can intersect and reflect ambiguous directions, improving likelihoods and sample quality on synthetic data, MNIST, CIFAR-10, and ImageNet, while enabling controllability through the latent space. By integrating a VAE-style latent mechanism with continuous-time flow matching, VRFM advances multimodal generative modeling in flow-based methods with practical gains in generation fidelity and interpretability.

Abstract

We study Variational Rectified Flow Matching, a framework that enhances classic rectified flow matching by modeling multi-modal velocity vector-fields. At inference time, classic rectified flow matching 'moves' samples from a source distribution to the target distribution by solving an ordinary differential equation via integration along a velocity vector-field. At training time, the velocity vector-field is learnt by linearly interpolating between coupled samples one drawn from the source and one drawn from the target distribution randomly. This leads to ''ground-truth'' velocity vector-fields that point in different directions at the same location, i.e., the velocity vector-fields are multi-modal/ambiguous. However, since training uses a standard mean-squared-error loss, the learnt velocity vector-field averages ''ground-truth'' directions and isn't multi-modal. In contrast, variational rectified flow matching learns and samples from multi-modal flow directions. We show on synthetic data, MNIST, CIFAR-10, and ImageNet that variational rectified flow matching leads to compelling results.

Variational Rectified Flow Matching

TL;DR

This paper tackles the limitation of classic rectified flow matching, which struggles to capture multi-modal velocity directions during transport from a source to a target distribution. It introduces Variational Rectified Flow Matching (VRFM), which conditions the velocity field on a latent variable and trains via a variational lower bound over , effectively modeling a Gaussian mixture of velocities and preserving the marginal data distribution. The approach yields flows that can intersect and reflect ambiguous directions, improving likelihoods and sample quality on synthetic data, MNIST, CIFAR-10, and ImageNet, while enabling controllability through the latent space. By integrating a VAE-style latent mechanism with continuous-time flow matching, VRFM advances multimodal generative modeling in flow-based methods with practical gains in generation fidelity and interpretability.

Abstract

We study Variational Rectified Flow Matching, a framework that enhances classic rectified flow matching by modeling multi-modal velocity vector-fields. At inference time, classic rectified flow matching 'moves' samples from a source distribution to the target distribution by solving an ordinary differential equation via integration along a velocity vector-field. At training time, the velocity vector-field is learnt by linearly interpolating between coupled samples one drawn from the source and one drawn from the target distribution randomly. This leads to ''ground-truth'' velocity vector-fields that point in different directions at the same location, i.e., the velocity vector-fields are multi-modal/ambiguous. However, since training uses a standard mean-squared-error loss, the learnt velocity vector-field averages ''ground-truth'' directions and isn't multi-modal. In contrast, variational rectified flow matching learns and samples from multi-modal flow directions. We show on synthetic data, MNIST, CIFAR-10, and ImageNet that variational rectified flow matching leads to compelling results.

Paper Structure

This paper contains 34 sections, 13 equations, 17 figures, 5 tables, 2 algorithms.

Figures (17)

  • Figure 1: Intuition and motivation: Rectified flow matching randomly couples source data and target data samples, as illustrated in panel (a). This leads to velocity vector-fields with ambiguous directions. Panel (b) shows that the classic rectified flow matching averages ambiguous targets, which leads to curved flows. In contrast, the proposed variational rectified flow matching is able to successfully model ambiguity which leads to less curved flows as depicted in panel (c).
  • Figure 2: Quantitative evaluation on synthetic 1D data for varying evaluation steps. Metrics are averaged over three runs. For True and Parzen Window Log-Likelihood, higher values are better.
  • Figure 3: 1D velocity ambiguity analysis with various conditioning options and sampling strategies. (a) Ground Truth, (b) Baseline (Rectified Flow), (c) Ours (Variational Rectified Flow) . The heatmap illustrates the velocity standard deviation for sampled bins in data-domain-time-domain, along with histograms of the velocity at four sampled locations. Our method effectively models velocity ambiguity, while the baseline produces deterministic outputs.
  • Figure 4: Flow visualization for synthetic 2D data using the Euler solver with 20 function evaluations. Sampled points from the source distribution are shown in red, and points from the target distribution in purple. Different from Rectified FM, which predicts flow trajectories with sharp curvature and U-turns to avoid crossings, our model captures velocity ambiguity and predicts flows that intersect.
  • Figure 5: Quantitative evaluation on synthetic 2D data for varying evaluation steps. Metrics are averaged over three runs with different random seeds.
  • ...and 12 more figures

Theorems & Definitions (1)

  • Claim 1