Table of Contents
Fetching ...

Is Flow Matching Just Trajectory Replay for Sequential Data?

Soon Hoe Lim, Shizheng Lin, Michael W. Mahoney, N. Benjamin Erichson

TL;DR

This work analyzes the velocity field learned by flow matching (FM) when applied to sequential data. It derives that the empirical FM optimum corresponds to a training-free, memory-augmented continuous-time dynamics, with the velocity field decomposing into a global linear part plus a kernel-weighted memory term that replays historical transitions. For Gaussian bridge paths, the resulting sampler is explicit and amenable to ensemble generation, connecting to Nadaraya-Watson estimation and diffusion-map operators. Empirical results on chaotic dynamical systems show that the training-free FreeFM can rival or surpass trained baselines in conditional and probabilistic forecasting while offering interpretable, data-driven dynamics. The paper highlights the importance of the probability path choice in FM and suggests future work on scalable, hybrid models that blend nonparametric memory with parametric structure for high-dimensional or nonstationary settings.

Abstract

Flow matching (FM) is increasingly used for time-series generation, but it is not well understood whether it learns a general dynamical structure or simply performs an effective "trajectory replay". We study this question by deriving the velocity field targeted by the empirical FM objective on sequential data, in the limit of perfect function approximation. For the Gaussian conditional paths commonly used in practice, we show that the implied sampler is an ODE whose dynamics constitutes a nonparametric, memory-augmented continuous-time dynamical system. The optimal field admits a closed-form expression as a similarity-weighted mixture of instantaneous velocities induced by past transitions, making the dataset dependence explicit and interpretable. This perspective positions neural FM models trained by stochastic optimization as parametric surrogates of an ideal nonparametric solution. Using the structure of the optimal field, we study sampling and approximation schemes that improve the efficiency and numerical robustness of ODE-based generation. On nonlinear dynamical system benchmarks, the resulting closed-form sampler yields strong probabilistic forecasts directly from historical transitions, without training.

Is Flow Matching Just Trajectory Replay for Sequential Data?

TL;DR

This work analyzes the velocity field learned by flow matching (FM) when applied to sequential data. It derives that the empirical FM optimum corresponds to a training-free, memory-augmented continuous-time dynamics, with the velocity field decomposing into a global linear part plus a kernel-weighted memory term that replays historical transitions. For Gaussian bridge paths, the resulting sampler is explicit and amenable to ensemble generation, connecting to Nadaraya-Watson estimation and diffusion-map operators. Empirical results on chaotic dynamical systems show that the training-free FreeFM can rival or surpass trained baselines in conditional and probabilistic forecasting while offering interpretable, data-driven dynamics. The paper highlights the importance of the probability path choice in FM and suggests future work on scalable, hybrid models that blend nonparametric memory with parametric structure for high-dimensional or nonstationary settings.

Abstract

Flow matching (FM) is increasingly used for time-series generation, but it is not well understood whether it learns a general dynamical structure or simply performs an effective "trajectory replay". We study this question by deriving the velocity field targeted by the empirical FM objective on sequential data, in the limit of perfect function approximation. For the Gaussian conditional paths commonly used in practice, we show that the implied sampler is an ODE whose dynamics constitutes a nonparametric, memory-augmented continuous-time dynamical system. The optimal field admits a closed-form expression as a similarity-weighted mixture of instantaneous velocities induced by past transitions, making the dataset dependence explicit and interpretable. This perspective positions neural FM models trained by stochastic optimization as parametric surrogates of an ideal nonparametric solution. Using the structure of the optimal field, we study sampling and approximation schemes that improve the efficiency and numerical robustness of ODE-based generation. On nonlinear dynamical system benchmarks, the resulting closed-form sampler yields strong probabilistic forecasts directly from historical transitions, without training.
Paper Structure (55 sections, 15 theorems, 171 equations, 13 figures, 1 algorithm)

This paper contains 55 sections, 15 theorems, 171 equations, 13 figures, 1 algorithm.

Key Result

Proposition 1

For the affine conditional flow generated by $v(t,z|X) = a_t(X)z + b_t(X)$ (where $a_t: \mathbb{R}^D \to \mathbb{R}$, $b_t: \mathbb{R}^D \to \mathbb{R}^d$), the (unique) minimizer of the empirical CFM (equivalently FM) objective where the expectation is over $t\sim\mathcal{U}[0,1]$, $X\sim\hat{p}_1$ and $Z_t\sim p_t(\cdot\mid X)$, admits the closed-form expression where the weights $w_j(t, z)$ a

Figures (13)

  • Figure 1: What Dynamical System Are FM Forecasters Actually Sampling From? For sequential data, optimal empirical FM induces a nonparametric, memory-augmented ODE, enabling training-free forecasting by replaying historical transitions. Inspired by this theoretical insight, we propose an ODE sampler $\frac{dZ_t}{dt} = G_t Z_t + h(t, Z_t; \mathcal{D}_M)$ (see \ref{['eq:CFM-memory-ODE']}), where the velocity field combines a global linear drift $G_t Z_t$ with a data-adaptive nonlinear memory term $h$. This nonlinear forcing is computed by attending to residual velocities $y_j(t)$, weighted by a kernel attention mechanism $\alpha_j(t,z)$. By initializing $Z_0$ from a Gaussian distribution around the current state $x_\tau$, integrating this ODE gives a next-step forecast $Z_1 \approx x_{\tau+1}$, and the method inherently supports generating an ensemble of forecasts to quantify uncertainty.
  • Figure 2: Conditional Forecast. (a) Examples of conditional forecasts generated by FreeFM and baseline models for 20 trajectories from the Aizawa attractor. Each trajectory originates from a different initial condition. (b) sMAPE and VPT of conditional forecast results from FreeFM and baseline models. Shaded regions indicate ±0.5 standard error over 135 dynamical systems, each with 20 trajectories originating from randomly sampled initial conditions.
  • Figure 3: Probabilistic Forecast. (a)-(b) Examples of probabilistic forecast generated by FreeFM and fully trained vanilla flow matching model for time series from Lorenz-63. Error shadows are standard error over 50 Monte-Carlo simulations. (c) sMAPE and CRPS of probabilistic forecast results from FreeFM and fully trained vanilla flow matching model. Error shadows are 0.5 standard error over 135 dynamical systems with 20 random initial conditions and 50 Monte-Carlo simulations.
  • Figure 4: Long Term Attractor Reconstruction. (a) Correlation dimensions of long term attractor reconstruction result from FreeFM and baseline models. (b) KL divergence of long term attractor reconstruction result from FreeFM and baseline models. Error shadows are 0.5 standard error. Both results are presented over 135 dynamical systems, each has 20 trajectories originated from 20 random initial conditions.
  • Figure 5: Illustration of Dynamical Measure Transport via Flow Matching (FM). The schematic depicts the continuous transport of a probability measure from a source to a target distribution. (Left) The process initializes with a standard Gaussian source measure $p_0(z) = \mathcal{N}(0, I)$. (Middle) The FM objective defines a vector field $v_t(z)$ that drives the transport. The resulting ODE flow $dz/dt = v_t(z)$ pushes the probability mass along time-dependent trajectories, creating a probability path $p_t(z)$ that undergoes a topological bifurcation (splitting from one mode to two). (Right) The measure is successfully transported to the target bimodal density $p_1(z)$, with samples settling at the modes $\pm m$.
  • ...and 8 more figures

Theorems & Definitions (29)

  • Proposition 1: Closed-Form Empirical FM
  • Proposition 2: Lipschitz bound
  • Proposition 3: Truncation error
  • Proposition 4: Coupled empirical affine FM/CFM minimizer
  • proof : Proof of Proposition \ref{['prop:coupled_emp_affine']}
  • Example 1: Empirical Rectified Flow
  • Example 2: Empirical Affine Flows
  • Proposition 5
  • Example 3: Empirical Affine Flows: KDE-to-KDE Transport
  • Corollary 1: KDE-to-KDE affine transport with coupling
  • ...and 19 more