Is Flow Matching Just Trajectory Replay for Sequential Data?
Soon Hoe Lim, Shizheng Lin, Michael W. Mahoney, N. Benjamin Erichson
TL;DR
This work analyzes the velocity field learned by flow matching (FM) when applied to sequential data. It derives that the empirical FM optimum corresponds to a training-free, memory-augmented continuous-time dynamics, with the velocity field decomposing into a global linear part plus a kernel-weighted memory term that replays historical transitions. For Gaussian bridge paths, the resulting sampler is explicit and amenable to ensemble generation, connecting to Nadaraya-Watson estimation and diffusion-map operators. Empirical results on chaotic dynamical systems show that the training-free FreeFM can rival or surpass trained baselines in conditional and probabilistic forecasting while offering interpretable, data-driven dynamics. The paper highlights the importance of the probability path choice in FM and suggests future work on scalable, hybrid models that blend nonparametric memory with parametric structure for high-dimensional or nonstationary settings.
Abstract
Flow matching (FM) is increasingly used for time-series generation, but it is not well understood whether it learns a general dynamical structure or simply performs an effective "trajectory replay". We study this question by deriving the velocity field targeted by the empirical FM objective on sequential data, in the limit of perfect function approximation. For the Gaussian conditional paths commonly used in practice, we show that the implied sampler is an ODE whose dynamics constitutes a nonparametric, memory-augmented continuous-time dynamical system. The optimal field admits a closed-form expression as a similarity-weighted mixture of instantaneous velocities induced by past transitions, making the dataset dependence explicit and interpretable. This perspective positions neural FM models trained by stochastic optimization as parametric surrogates of an ideal nonparametric solution. Using the structure of the optimal field, we study sampling and approximation schemes that improve the efficiency and numerical robustness of ODE-based generation. On nonlinear dynamical system benchmarks, the resulting closed-form sampler yields strong probabilistic forecasts directly from historical transitions, without training.
