SuperFlow: Training Flow Matching Models with RL on the Fly

Kaijie Chen; Zhiyang Xu; Ying Shen; Zihao Lin; Yuguang Yao; Lifu Huang

SuperFlow: Training Flow Matching Models with RL on the Fly

Kaijie Chen, Zhiyang Xu, Ying Shen, Zihao Lin, Yuguang Yao, Lifu Huang

TL;DR

SuperFlow addresses crucial inefficiencies in RL-based flow matching by introducing variance-aware Streaming-to-Group Sampling and step-level advantage re-estimation along the denoising trajectory. By coupling per-prompt trackers with adaptive sampling and a refined credit assignment mechanism, it achieves competitive or superior results while using substantially fewer training steps and compute. The approach yields consistent gains across text rendering, compositional image generation, and human-preference alignment, demonstrating robust generalization to different rewards and tasks. Overall, SuperFlow offers a practical, scalable enhancement to flow-based T2I post-training without architectural changes, improving stability and efficiency in RL for diffusion/flow models.

Abstract

Recent progress in flow-based generative models and reinforcement learning (RL) has improved text-image alignment and visual quality. However, current RL training for flow models still has two main problems: (i) GRPO-style fixed per-prompt group sizes ignore variation in sampling importance across prompts, which leads to inefficient sampling and slower training; and (ii) trajectory-level advantages are reused as per-step estimates, which biases credit assignment along the flow. We propose SuperFlow, an RL training framework for flow-based models that adjusts group sizes with variance-aware sampling and computes step-level advantages in a way that is consistent with continuous-time flow dynamics. Empirically, SuperFlow reaches promising performance while using only 5.4% to 56.3% of the original training steps and reduces training time by 5.2% to 16.7% without any architectural changes. On standard text-to-image (T2I) tasks, including text rendering, compositional image generation, and human preference alignment, SuperFlow improves over SD3.5-M by 4.6% to 47.2%, and over Flow-GRPO by 1.7% to 16.0%.

SuperFlow: Training Flow Matching Models with RL on the Fly

TL;DR

Abstract

SuperFlow: Training Flow Matching Models with RL on the Fly

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)