On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Quentin Bertrand, Anne Gagneux, Mathurin Massias, Rémi Emonet
TL;DR
The paper challenges the view that stochastic targets drive generalization in flow matching, showing that in high-dimensional data the stochasticity of the target contributes little to performance. By leveraging the closed-form optimal velocity field and introducing Empirical Flow Matching (EFM), it demonstrates that reducing target stochasticity—via a data-driven, unbiased estimator of the closed-form—yields stable or improved generalization on CIFAR-10 and CelebA. The key insight is that generalization is linked to the network's ability to approximate the closed-form velocity, with early-time dynamics playing a decisive role, rather than the presence of stochastic targets throughout training. These results suggest practical routes to more efficient and robust flow-based generative modeling, while highlighting the need to consider potential societal impacts of high-quality synthetic data.
Abstract
Modern deep generative models can now produce high-quality synthetic samples that are often indistinguishable from real training data. A growing body of research aims to understand why recent methods, such as diffusion and flow matching techniques, generalize so effectively. Among the proposed explanations are the inductive biases of deep learning architectures and the stochastic nature of the conditional flow matching loss. In this work, we rule out the noisy nature of the loss as a key factor driving generalization in flow matching. First, we empirically show that in high-dimensional settings, the stochastic and closed-form versions of the flow matching loss yield nearly equivalent losses. Then, using state-of-the-art flow matching models on standard image datasets, we demonstrate that both variants achieve comparable statistical performance, with the surprising observation that using the closed-form can even improve performance.
