FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner

Wenliang Zhao; Minglei Shi; Xumin Yu; Jie Zhou; Jiwen Lu

FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner

Wenliang Zhao, Minglei Shi, Xumin Yu, Jie Zhou, Jiwen Lu

TL;DR

This paper proposes a framework called FlowTurbo to accelerate the sampling of flow-based models while still enhancing the sampling quality, and introduces several techniques including a pseudo corrector and sample-aware compilation to further reduce inference time.

Abstract

Building on the success of diffusion models in visual generation, flow-based models reemerge as another prominent family of generative models that have achieved competitive or better performance in terms of both visual quality and inference speed. By learning the velocity field through flow-matching, flow-based models tend to produce a straighter sampling trajectory, which is advantageous during the sampling process. However, unlike diffusion models for which fast samplers are well-developed, efficient sampling of flow-based generative models has been rarely explored. In this paper, we propose a framework called FlowTurbo to accelerate the sampling of flow-based models while still enhancing the sampling quality. Our primary observation is that the velocity predictor's outputs in the flow-based models will become stable during the sampling, enabling the estimation of velocity via a lightweight velocity refiner. Additionally, we introduce several techniques including a pseudo corrector and sample-aware compilation to further reduce inference time. Since FlowTurbo does not change the multi-step sampling paradigm, it can be effectively applied for various tasks such as image editing, inpainting, etc. By integrating FlowTurbo into different flow-based models, we obtain an acceleration ratio of 53.1%$\sim$58.3% on class-conditional generation and 29.8%$\sim$38.5% on text-to-image generation. Notably, FlowTurbo reaches an FID of 2.12 on ImageNet with 100 (ms / img) and FID of 3.93 with 38 (ms / img), achieving the real-time image generation and establishing the new state-of-the-art. Code is available at https://github.com/shiml20/FlowTurbo.

FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner

TL;DR

Abstract

58.3% on class-conditional generation and 29.8%

38.5% on text-to-image generation. Notably, FlowTurbo reaches an FID of 2.12 on ImageNet with 100 (ms / img) and FID of 3.93 with 38 (ms / img), achieving the real-time image generation and establishing the new state-of-the-art. Code is available at https://github.com/shiml20/FlowTurbo.

Paper Structure (35 sections, 42 equations, 8 figures, 5 tables, 2 algorithms)

This paper contains 35 sections, 42 equations, 8 figures, 5 tables, 2 algorithms.

Introduction
Related Work
Method
Preliminaries: Diffusion and Flow-based Models
Efficient Estimation of Velocity
Towards Real-Time Image Generation
Discussion
Experiments
Setups
Main Results
Comparisons to State-of-the-Arts
Analysis
Conclusion
Detailed Background of Diffusion and Flow-based Models
Diffusion Models
...and 20 more sections

Figures (8)

Figure 1: Visualization of the curvatures of the sampling trajectories of different models. We compare the curvatures of the model predictions of a standard diffusion model (DiT peebles2023scalable) and several flow-based models (SiT ma2024sit, SD3-Medium esser2024scaling, FLUX.1-dev flux2024, and Open-Sora opensora) during the sampling. We observe that the $\mathbf{v}_\theta$ in flow-based models is much more stable than $\bm{\epsilon}$ of diffusion models during the sampling, which motivates us to seek a more lightweight estimation model to reduce the sampling costs of flow-based generative models.
Figure 2: Overview of FlowTurbo.(a) Motivated by the stability of the velocity predictor's outputs during the sampling, we propose to learn a lightweight velocity refiner to regress the offset of the velocity field. (b)(c) We propose the pseudo corrector which leverages a velocity cache to reduce the number of model evaluations while maintaining the same convergence order as Heun's method. (d) During sampling, we employ a combination of Heun's method, the pseudo corrector, and the velocity refiner, where each sample block is processed with the proposed sample-aware compilation.
Figure 3: FlowTurbo exhibits favorable trade-offs compared with SOTA methods.
Figure 4: Qualitative results.(a) We compared our FlowTurbo with Heun's method on Lumina-Next-T2I gao2024lumina. With better image quality, our method requires much less sampling time ($-30.8\%$). (b) Since FlowTurbo remains the multi-step sampling paradigm, it can be seamlessly applied to more applications such as image inpainting, image editing, and object removal.
Figure 5: Random samples from FlowTurbo on ImageNet 256 × 256. We use a classifier-free guidance scale of 4.0 and the sample config of ${H_8P_9R_5}$ (100 ms / img)
...and 3 more figures

FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner

TL;DR

Abstract

FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner

Authors

TL;DR

Abstract

Table of Contents

Figures (8)