A Long-Short Flow-Map Perspective for Drifting Models

Zhiqi Li; Bo Zhu

A Long-Short Flow-Map Perspective for Drifting Models

Zhiqi Li, Bo Zhu

Abstract

This paper provides a reinterpretation of the Drifting Model~\cite{deng2026generative} through a semigroup-consistent long-short flow-map factorization. We show that a global transport process can be decomposed into a long-horizon flow map followed by a short-time terminal flow map admitting a closed-form optimal velocity representation, and that taking the terminal interval length to zero recovers exactly the drifting field together with a conservative impulse term required for flow-map consistency. Based on this perspective, we propose a new likelihood learning formulation that aligns the long-short flow-map decomposition with density evolution under transport. We validate the framework through both theoretical analysis and empirical evaluations on benchmark tests, and further provide a theoretical interpretation of the feature-space optimization while highlighting several open problems for future study.

A Long-Short Flow-Map Perspective for Drifting Models

Abstract

Paper Structure (40 sections, 6 theorems, 64 equations, 9 figures, 1 table)

This paper contains 40 sections, 6 theorems, 64 equations, 9 figures, 1 table.

Introduction
Background
Closed-Form Flow Matching
One-Step Flow Map
Long-Short Flow-Map Perspective
First-Order Approximation
Second-Order Approximation
Connection with Drifting Model
Drifting Model
Connection of Loss Functions
Discussion on Design Choices
Application: Likelihood Learning
Background
Drifting Model for Likelihood Learning
Sampling Process for Lagrangian Likelihood Learning
...and 25 more sections

Key Result

Theorem 2.1

eq:FlowMatchingLoss admits an optimal solution which further admits a closed-form expression (see sec:proof_close_form1 for proof.)

Figures (9)

Figure 1: (a) Closed-form Flow Matching (\ref{['sec:close_form']}) requires kernel-weighted aggregation over data points, which becomes prohibitively expensive when the kernel is diffuse (far from $t=1$). (b) Flow-map training (\ref{['sec:flow_map']}) enforces trajectory consistency but is difficult to supervise directly from data, so spurious consistent paths (e.g., the semi-transparent segment) can still satisfy the constraint. (c) Our Long-Short Flow Map method (\ref{['sec:long_short_flow_map']}) applies a closed form estimator on the short step $\psi_{1-\Delta t\to 1}$ as $\Delta t\to 0$ (\ref{['sec:first_order']} first order, \ref{['sec:second_order']} second order), which provides dataset grounded supervision for learning the long step $\psi_{0\to 1-\Delta t}$ and thereby addresses the challenges of both closed form Flow Matching and flow map methods.
Figure 2: Overview of the Long-Short Flow Map framework. Leveraging flow-map trajectory consistency, we decompose the full map $\psi_{0\to 1}$ into a long map $\psi_{0\to 1-\Delta t}$ and a short map $\psi_{1-\Delta t\to 1}$. The short map is approximated using forward Euler or the trapezoidal rule and then computed via the closed-form solution in flow matching, providing dataset-level supervision for learning the long map $\psi_{0\to 1-\Delta t}$. Taking the limit $\Delta t \to 0$ yields a long-short flow-map derivation of the Drifting Model.
Figure 3: Unconditional latent generation results on CelebA-HQ. We train for 100K steps using feature-space optimization with a pretrained MAE as the feature extractor and achieve an FID of 14.71. For comparison, the corresponding MeanFlow model reaches an FID of 12.4 after 400K training steps. Full results are provided in \ref{['tab:bs_steps_fid']}.
Figure 4: 2D Examples. We show the ground-truth samples, generated samples, and learned likelihoods from the Eulerian and Lagrangian views on Spiral, Checkerboard, and Two Moons.
Figure 5: Image generation. FID trajectories under different kernels and batch sizes, together with an ablation on using multiple feature spaces. The Laplacian kernel corresponds to the first-order kernel used in Drifting Model deng2026generative, whereas the Gaussian kernel is the second-order kernel derived in our framework. Overall, the Gaussian kernel performs comparably to the Laplacian kernel, and can be better in some regimes. In the ablation (batch size $B{=}64$, lr$= 5\times 10^{-5}$), we compare using the default feature set with thousands of channels, using only four features (the outputs at the four encoder resolutions), and operating directly in the original space; The latter two settings fail to produce high-quality results.
...and 4 more figures

Theorems & Definitions (11)

Theorem 2.1: Closed-Form Solution of Flow Matching bertrand2025closed
Proposition 1: Trajectory Consistency
Theorem 3.1: Closed-Form Solution of the Endpoint Velocity
Theorem 4.1: Closed-Form Solution for Velocity Divergence
Theorem 5.1: Feature-Data Space Optimal Velocity Relation
proof
Theorem A.1: Non-Existence of Conditional Flow Maps li2026trajectory
proof
proof
proof
...and 1 more

A Long-Short Flow-Map Perspective for Drifting Models

Abstract

A Long-Short Flow-Map Perspective for Drifting Models

Authors

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (11)