Table of Contents
Fetching ...

FlowConsist: Make Your Flow Consistent with Real Trajectory

Tianyi Zhang, Chengcheng Liu, Jinwei Chen, Chun-Le Guo, Chongyi Li, Ming-Ming Cheng, Bo Li, Peng-Tao Jiang

TL;DR

FlowConsist addresses two core problems in fast-flow generative models: trajectory drift caused by using conditional velocities and the accumulation of approximation errors along long trajectories. It replaces conditional velocities with the model’s marginal velocity to enforce a single consistent ODE path and adds a trajectory rectification mechanism that aligns generated marginals with real data at every step, including a marginal-velocity alignment via an auxiliary predictor. The approach achieves state-of-the-art 1-NFE results on ImageNet 256×256 (FID 1.52), demonstrating substantial gains over prior fast-flow methods and competitive performance with multi-step diffusion. The work provides both theoretical analysis and a practical training framework to enhance fast-flow models without teacher distillation, with broad applicability across architectures.

Abstract

Fast flow models accelerate the iterative sampling process by learning to directly predict ODE path integrals, enabling one-step or few-step generation. However, we argue that current fast-flow training paradigms suffer from two fundamental issues. First, conditional velocities constructed from randomly paired noise-data samples introduce systematic trajectory drift, preventing models from following a consistent ODE path. Second, the model's approximation errors accumulate over time steps, leading to severe deviations across long time intervals. To address these issues, we propose FlowConsist, a training framework designed to enforce trajectory consistency in fast flows. We propose a principled alternative that replaces conditional velocities with the marginal velocities predicted by the model itself, aligning optimization with the true trajectory. To further address error accumulation over time steps, we introduce a trajectory rectification strategy that aligns the marginal distributions of generated and real samples at every time step along the trajectory. Our method establishes a new state-of-the-art on ImageNet 256$\times$256, achieving an FID of 1.52 with only 1 sampling step.

FlowConsist: Make Your Flow Consistent with Real Trajectory

TL;DR

FlowConsist addresses two core problems in fast-flow generative models: trajectory drift caused by using conditional velocities and the accumulation of approximation errors along long trajectories. It replaces conditional velocities with the model’s marginal velocity to enforce a single consistent ODE path and adds a trajectory rectification mechanism that aligns generated marginals with real data at every step, including a marginal-velocity alignment via an auxiliary predictor. The approach achieves state-of-the-art 1-NFE results on ImageNet 256×256 (FID 1.52), demonstrating substantial gains over prior fast-flow methods and competitive performance with multi-step diffusion. The work provides both theoretical analysis and a practical training framework to enhance fast-flow models without teacher distillation, with broad applicability across architectures.

Abstract

Fast flow models accelerate the iterative sampling process by learning to directly predict ODE path integrals, enabling one-step or few-step generation. However, we argue that current fast-flow training paradigms suffer from two fundamental issues. First, conditional velocities constructed from randomly paired noise-data samples introduce systematic trajectory drift, preventing models from following a consistent ODE path. Second, the model's approximation errors accumulate over time steps, leading to severe deviations across long time intervals. To address these issues, we propose FlowConsist, a training framework designed to enforce trajectory consistency in fast flows. We propose a principled alternative that replaces conditional velocities with the marginal velocities predicted by the model itself, aligning optimization with the true trajectory. To further address error accumulation over time steps, we introduce a trajectory rectification strategy that aligns the marginal distributions of generated and real samples at every time step along the trajectory. Our method establishes a new state-of-the-art on ImageNet 256256, achieving an FID of 1.52 with only 1 sampling step.
Paper Structure (16 sections, 35 equations, 7 figures, 6 tables)

This paper contains 16 sections, 35 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: (a) Marginal flow fields and conditional paths. A conditional path (blue line) and its conditional velocity $v_t(x_t|x, \epsilon)$ can be constructed by pairing a randomly sampled noise $\epsilon$ with a data $x$. By marginalizing over all possible conditioned paths, we obtain a marginal flow field (gray lines). When constructing a randomly paired conditional path, it becomes evident that such a path intersects multiple distinct marginal path. (b) We construct a conditional path (blue line) by perturbing the image $x_0$ with randomly sampled noise $\epsilon$. We initialize the marginal paths (dashed lines using a pre-trained flow model) from various time steps along this conditional path and obtain distinct data samples. This demonstrates that the conditional velocity induces a drift in the marginal trajectories, causing them to deviate from the ground-truth trajectory (green) toward erroneous trajectories (red).
  • Figure 2: 1-NFE generation results of FlowConsist-XL/2 on ImageNet 256x256. More uncurated results are in the Appendix \ref{['visual']}.
  • Figure 3: (a) Comparison between conditioned and marginal paths. We construct a conditioned path by adding random noise to an image $x_0$, and subsequently trace marginal paths $\tau_i$ starting from various time steps along this conditioned path using a pre-trained flow model. By comparing the data sample $x_0^{\tau_i}$ of these distinct trajectories, we observe that different points on a single conditioned path actually belong to divergent marginal paths $\tau$. (b) Mean Squared Error (MSE) between $x_0$ and different $x_0^{\tau_i}$, plotted as a function of the starting time step. The discrepancy between the data samples of the conditioned and marginal paths increases significantly as the time span grows larger. (c) The MSE between the marginal velocity $u_t$ and conditional velocity $v_t$ for various $x_t$ along the conditional path, plotted as a function of the time step. For any given $x_t$, there exists a persistent discrepancy between $u_t$ and $v_t$.
  • Figure 4: (a) MSE between the 1-NFE samples mapped from various time steps along the marginal trajectory and the corresponding ground truth data samples, plotted against the starting time step. This serves to quantify the cumulative error inherent in single-step mapping across different time steps. (b) During the actual training process, with $s=0$ fixed, the $L_2$ norm of the training objective defined by Eq. \ref{['eq:eq7']} as it varies with the starting time step $t$.
  • Figure 5: FID curves as a function of CFG scale under different configurations. (a) Using MeanFlow (MF) as the model architecture: FID scores versus CFG scale after sequentially applying CFG conditioning (best FID: 5.28), trajectory consistency loss in Eq. \ref{['eq:eq7']} (best FID: 4.64), and marginal velocity alignment loss in Eq. \ref{['eq:eq11']} (best FID: 4.31). (b) Using Consistency Models (CM) as the architecture: the best FID scores for CFG conditioning, trajectory consistency loss, and marginal velocity alignment loss are 6.74, 5.84, and 5.36, respectively.
  • ...and 2 more figures