Table of Contents
Fetching ...

SuperFlow: Training Flow Matching Models with RL on the Fly

Kaijie Chen, Zhiyang Xu, Ying Shen, Zihao Lin, Yuguang Yao, Lifu Huang

TL;DR

SuperFlow addresses crucial inefficiencies in RL-based flow matching by introducing variance-aware Streaming-to-Group Sampling and step-level advantage re-estimation along the denoising trajectory. By coupling per-prompt trackers with adaptive sampling and a refined credit assignment mechanism, it achieves competitive or superior results while using substantially fewer training steps and compute. The approach yields consistent gains across text rendering, compositional image generation, and human-preference alignment, demonstrating robust generalization to different rewards and tasks. Overall, SuperFlow offers a practical, scalable enhancement to flow-based T2I post-training without architectural changes, improving stability and efficiency in RL for diffusion/flow models.

Abstract

Recent progress in flow-based generative models and reinforcement learning (RL) has improved text-image alignment and visual quality. However, current RL training for flow models still has two main problems: (i) GRPO-style fixed per-prompt group sizes ignore variation in sampling importance across prompts, which leads to inefficient sampling and slower training; and (ii) trajectory-level advantages are reused as per-step estimates, which biases credit assignment along the flow. We propose SuperFlow, an RL training framework for flow-based models that adjusts group sizes with variance-aware sampling and computes step-level advantages in a way that is consistent with continuous-time flow dynamics. Empirically, SuperFlow reaches promising performance while using only 5.4% to 56.3% of the original training steps and reduces training time by 5.2% to 16.7% without any architectural changes. On standard text-to-image (T2I) tasks, including text rendering, compositional image generation, and human preference alignment, SuperFlow improves over SD3.5-M by 4.6% to 47.2%, and over Flow-GRPO by 1.7% to 16.0%.

SuperFlow: Training Flow Matching Models with RL on the Fly

TL;DR

SuperFlow addresses crucial inefficiencies in RL-based flow matching by introducing variance-aware Streaming-to-Group Sampling and step-level advantage re-estimation along the denoising trajectory. By coupling per-prompt trackers with adaptive sampling and a refined credit assignment mechanism, it achieves competitive or superior results while using substantially fewer training steps and compute. The approach yields consistent gains across text rendering, compositional image generation, and human-preference alignment, demonstrating robust generalization to different rewards and tasks. Overall, SuperFlow offers a practical, scalable enhancement to flow-based T2I post-training without architectural changes, improving stability and efficiency in RL for diffusion/flow models.

Abstract

Recent progress in flow-based generative models and reinforcement learning (RL) has improved text-image alignment and visual quality. However, current RL training for flow models still has two main problems: (i) GRPO-style fixed per-prompt group sizes ignore variation in sampling importance across prompts, which leads to inefficient sampling and slower training; and (ii) trajectory-level advantages are reused as per-step estimates, which biases credit assignment along the flow. We propose SuperFlow, an RL training framework for flow-based models that adjusts group sizes with variance-aware sampling and computes step-level advantages in a way that is consistent with continuous-time flow dynamics. Empirically, SuperFlow reaches promising performance while using only 5.4% to 56.3% of the original training steps and reduces training time by 5.2% to 16.7% without any architectural changes. On standard text-to-image (T2I) tasks, including text rendering, compositional image generation, and human preference alignment, SuperFlow improves over SD3.5-M by 4.6% to 47.2%, and over Flow-GRPO by 1.7% to 16.0%.

Paper Structure

This paper contains 65 sections, 45 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: SuperFlow.Left: Variance-aware group sampling allocates more samples to prompts with higher reward variance and uses a value tracker instead of a per-group mean, reducing wasted samples and equal-advantage cases in Flow-GRPO. Right: Step-level advantage re-estimation along the denoising path. As reward variance decreases over time steps, recomputing advantages at each step yields more accurate credit assignment for flow training.
  • Figure 2: SuperFlow: Qualitative Comparison on the Visual Text Rendering Task. Our approach achieves higher text accuracy and readability compared with baselines.
  • Figure 3: SuperFlow: Qualitative Comparison on the Compositional Image Generation Task. Our method improves accuracy in object composition, position, and attribute consistency.
  • Figure 4: SuperFlow: Qualitative Comparison on the Human Preference Alignment Task. Our method produces images that better match human-preferred visual quality and prompt alignment.
  • Figure 5: OCR training time.
  • ...and 4 more figures