StreamFlow: Theory, Algorithm, and Implementation for High-Efficiency Rectified Flow Generation
Sen Fang, Hongbin Zhong, Yalin Feng, Dimitris N. Metaxas
TL;DR
Rectified Flow models suffer from slow inference due to their dynamic time-step structures. StreamFlow introduces a cohesive acceleration framework with batched velocity-field processing, vectorized time windows, and runtime-adaptive TensorRT compilation to handle heterogeneous timesteps. The approach achieves up to 611% speedup on 512×512 images and maintains high generation quality, with robust scalability to larger resolutions. This work enables practical deployment of large-scale flow-based generative models by delivering substantial throughput gains without sacrificing fidelity.
Abstract
New technologies such as Rectified Flow and Flow Matching have significantly improved the performance of generative models in the past two years, especially in terms of control accuracy, generation quality, and generation efficiency. However, due to some differences in its theory, design, and existing diffusion models, the existing acceleration methods cannot be directly applied to the Rectified Flow model. In this article, we have comprehensively implemented an overall acceleration pipeline from the aspects of theory, design, and reasoning strategies. This pipeline uses new methods such as batch processing with a new velocity field, vectorization of heterogeneous time-step batch processing, and dynamic TensorRT compilation for the new methods to comprehensively accelerate related models based on flow models. Currently, the existing public methods usually achieve an acceleration of 18%, while experiments have proved that our new method can accelerate the 512*512 image generation speed to up to 611%, which is far beyond the current non-generalized acceleration methods.
