Variational Rectified Flow Matching
Pengsheng Guo, Alexander G. Schwing
TL;DR
This paper tackles the limitation of classic rectified flow matching, which struggles to capture multi-modal velocity directions during transport from a source to a target distribution. It introduces Variational Rectified Flow Matching (VRFM), which conditions the velocity field on a latent variable $z$ and trains via a variational lower bound over $z$, effectively modeling a Gaussian mixture of velocities $p(v|x_t,t,z)$ and preserving the marginal data distribution. The approach yields flows that can intersect and reflect ambiguous directions, improving likelihoods and sample quality on synthetic data, MNIST, CIFAR-10, and ImageNet, while enabling controllability through the latent space. By integrating a VAE-style latent mechanism with continuous-time flow matching, VRFM advances multimodal generative modeling in flow-based methods with practical gains in generation fidelity and interpretability.
Abstract
We study Variational Rectified Flow Matching, a framework that enhances classic rectified flow matching by modeling multi-modal velocity vector-fields. At inference time, classic rectified flow matching 'moves' samples from a source distribution to the target distribution by solving an ordinary differential equation via integration along a velocity vector-field. At training time, the velocity vector-field is learnt by linearly interpolating between coupled samples one drawn from the source and one drawn from the target distribution randomly. This leads to ''ground-truth'' velocity vector-fields that point in different directions at the same location, i.e., the velocity vector-fields are multi-modal/ambiguous. However, since training uses a standard mean-squared-error loss, the learnt velocity vector-field averages ''ground-truth'' directions and isn't multi-modal. In contrast, variational rectified flow matching learns and samples from multi-modal flow directions. We show on synthetic data, MNIST, CIFAR-10, and ImageNet that variational rectified flow matching leads to compelling results.
