Table of Contents
Fetching ...

Improved Mean Flows: On the Challenges of Fastforward Generative Models

Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J. Zico Kolter, Kaiming He

TL;DR

This work tackles two core limitations of MeanFlow: a network-dependent training target and rigid classifier-free guidance. By reformulating MF as a $v$-loss re-parameterized through $u$, and by introducing CFG as explicit conditioning with in-context conditioning, the authors create a stable, flexible one-step generative framework. The Improved MeanFlow (iMF) achieves a record 1-NFE FID of 1.72 on ImageNet 256×256 without distillation and demonstrates substantial gains over original MF and other fastforward approaches. These advances position fastforward generative models as a competitive stand-alone paradigm with practical benefits for flexible guidance and conditioning-based control.

Abstract

MeanFlow (MF) has recently been established as a framework for one-step generative modeling. However, its ``fastforward'' nature introduces key challenges in both the training objective and the guidance mechanism. First, the original MF's training target depends not only on the underlying ground-truth fields but also on the network itself. To address this issue, we recast the objective as a loss on the instantaneous velocity $v$, re-parameterized by a network that predicts the average velocity $u$. Our reformulation yields a more standard regression problem and improves the training stability. Second, the original MF fixes the classifier-free guidance scale during training, which sacrifices flexibility. We tackle this issue by formulating guidance as explicit conditioning variables, thereby retaining flexibility at test time. The diverse conditions are processed through in-context conditioning, which reduces model size and benefits performance. Overall, our $\textbf{improved MeanFlow}$ ($\textbf{iMF}$) method, trained entirely from scratch, achieves $\textbf{1.72}$ FID with a single function evaluation (1-NFE) on ImageNet 256$\times$256. iMF substantially outperforms prior methods of this kind and closes the gap with multi-step methods while using no distillation. We hope our work will further advance fastforward generative modeling as a stand-alone paradigm.

Improved Mean Flows: On the Challenges of Fastforward Generative Models

TL;DR

This work tackles two core limitations of MeanFlow: a network-dependent training target and rigid classifier-free guidance. By reformulating MF as a -loss re-parameterized through , and by introducing CFG as explicit conditioning with in-context conditioning, the authors create a stable, flexible one-step generative framework. The Improved MeanFlow (iMF) achieves a record 1-NFE FID of 1.72 on ImageNet 256×256 without distillation and demonstrates substantial gains over original MF and other fastforward approaches. These advances position fastforward generative models as a competitive stand-alone paradigm with practical benefits for flexible guidance and conditioning-based control.

Abstract

MeanFlow (MF) has recently been established as a framework for one-step generative modeling. However, its ``fastforward'' nature introduces key challenges in both the training objective and the guidance mechanism. First, the original MF's training target depends not only on the underlying ground-truth fields but also on the network itself. To address this issue, we recast the objective as a loss on the instantaneous velocity , re-parameterized by a network that predicts the average velocity . Our reformulation yields a more standard regression problem and improves the training stability. Second, the original MF fixes the classifier-free guidance scale during training, which sacrifices flexibility. We tackle this issue by formulating guidance as explicit conditioning variables, thereby retaining flexibility at test time. The diverse conditions are processed through in-context conditioning, which reduces model size and benefits performance. Overall, our () method, trained entirely from scratch, achieves FID with a single function evaluation (1-NFE) on ImageNet 256256. iMF substantially outperforms prior methods of this kind and closes the gap with multi-step methods while using no distillation. We hope our work will further advance fastforward generative modeling as a stand-alone paradigm.

Paper Structure

This paper contains 37 sections, 17 equations, 9 figures, 4 tables, 2 algorithms.

Figures (9)

  • Figure 1: Conceptual comparison. Original MeanFlow (MF) mf predicts average velocity$u$ by a network $u_\theta$. As the ground-truth $u$ is unknown, original MF substitutes $u$ with the network's own prediction. We show that the original MF objective is equivalent to a loss on the instantaneous velocity$v$ (namely, $v$-loss), but re-parameterized by the neural network $u_\theta$ (namely, $u$-pred), as shown in (a). This re-parameterization, encompassed within the gray box, is determined by the MeanFlow identitymf. This reformulation reveals that the input to the compound function (in the gray box) is not only the noisy data (here, $z$), but also the conditional velocity ($e - x$), which is not a standard regression problem. In (b), our improved objective is conceptually $v$-loss re-parameterized by $u$-pred, taking only the legitimate input $z$.
  • Figure 2: MeanFlow as $v$-loss. Original MeanFlow (MF) mf models the average velocity $u$ and train the network $u_\theta$ via a $u$-loss parameterized by $u_\theta$ itself. We show that MF can be reformulated as a $v$-loss re-parameterized by $u_\theta$, driven by the MeanFlow identity in \ref{['eq:mf-identity2']}.
  • Figure 3: Training losses. We examine the loss of samples only with $t \neq r$, since a batch also contains samples of $t = r$, for which the $\mathtt{JVP}$ term becomes zero due to its coefficient $(t-r)$. Both MF and iMF can be viewed as $v$-loss, using different forms of compound $V_\theta$. Original MF's loss is non-decreasing and has high variance. (Settings: MeanFlow-B/2, trained with basic $\ell_2$ loss with no adaptive weighting, and with no CFG.)
  • Figure 4: Optimal CFG scales shift under different settings. In general, a stronger setting has a smaller optimal CFG scale, as reflected by increased training epochs (left) and inference steps (right). This investigation is enabled by our flexible CFG-conditioning, where a single model can support varying CFG scales even in the single/few-NFE case. (Settings: iMF-B/2 on ImageNet 256$\times$256.)
  • Figure 5: Improved in-context conditioning. Each type of conditions is turned into multiple tokens, which are concatenated with the image latent tokens along the sequence axis. It accommodates the conditions of time steps $(r,t)$, class $\mathbf{c}$, and guidance-related factors $\Omega$ (CFG scale $\omega$ and CFG intervals). Importantly, we do not use adaLN-zero for conditioning, which significantly reduces the model size (number of parameters) while maintaining performance.
  • ...and 4 more figures