Improving the Training of Rectified Flows
Sangyun Lee, Zinan Lin, Giulia Fanti
TL;DR
The paper tackles the costly sampling problem in diffusion models by focusing on rectified flows and the Reflow training algorithm. It demonstrates that, in practical settings, a single Reflow iteration suffices to produce near-straight ODE trajectories when combined with targeted training enhancements, enabling competitive 1-2 NFE sampling. By introducing a U-shaped timestep distribution, LPIPS-Huber premmetrics, diffusion-model initialization, and real-data incorporation, the authors achieve state-of-the-art or competitive FID results on CIFAR-10 and ImageNet-64x64 with 1-2 NFEs, while providing insights into sampling efficiency and inversion. The work argues for a shift toward more effective one-round training of rectified flows as a practical alternative to distillation methods in the low-NFE regime, with broad implications for fast, invertible generative modeling.
Abstract
Diffusion models have shown great promise for image and video generation, but sampling from state-of-the-art models requires expensive numerical integration of a generative ODE. One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error. However, rectified flows still require a relatively large number of function evaluations (NFEs). In this work, we propose improved techniques for training rectified flows, allowing them to compete with \emph{knowledge distillation} methods even in the low NFE setting. Our main insight is that under realistic settings, a single iteration of the Reflow algorithm for training rectified flows is sufficient to learn nearly straight trajectories; hence, the current practice of using multiple Reflow iterations is unnecessary. We thus propose techniques to improve one-round training of rectified flows, including a U-shaped timestep distribution and LPIPS-Huber premetric. With these techniques, we improve the FID of the previous 2-rectified flow by up to 75\% in the 1 NFE setting on CIFAR-10. On ImageNet 64$\times$64, our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation and progressive distillation in both one-step and two-step settings and rivals the performance of improved consistency training (iCT) in FID. Code is available at https://github.com/sangyun884/rfpp.
