Stream-level flow matching with Gaussian processes
Ganchao Wei, Li Ma
TL;DR
This work extends conditional flow matching (CFM) by introducing stream-level conditioning where latent streams are modeled with Gaussian processes (GPs), yielding GP-CFM. The approach preserves the simulation-free nature of CFM while smoothing the marginal vector field and reducing sampling variance, and it naturally accommodates multiple correlated observations, such as time series. Empirical results on synthetic data, MNIST, CIFAR-10, HWD+, and LFP demonstrate improved sample quality (lower $W_2$, $KID$, and $FID$) and smoother transformations, with covariate conditioning further enhancing performance. The GP-CFM framework is complementary to endpoint-based conditioning (e.g., OT-CFM) and provides a flexible, scalable tool for high-quality generative modeling of CNFs with structured training data.
Abstract
Flow matching (FM) is a family of training algorithms for fitting continuous normalizing flows (CNFs). Conditional flow matching (CFM) exploits the fact that the marginal vector field of a CNF can be learned by fitting least-squares regression to the conditional vector field specified given one or both ends of the flow path. In this paper, we extend the CFM algorithm by defining conditional probability paths along ``streams'', instances of latent stochastic paths that connect data pairs of source and target, which are modeled with Gaussian process (GP) distributions. The unique distributional properties of GPs help preserve the ``simulation-free" nature of CFM training. We show that this generalization of the CFM can effectively reduce the variance in the estimated marginal vector field at a moderate computational cost, thereby improving the quality of the generated samples under common metrics. Additionally, adopting the GP on the streams allows for flexibly linking multiple correlated training data points (e.g., time series). We empirically validate our claim through both simulations and applications to image and neural time series data.
