Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models

Bowen Ping; Chengyou Jia; Minnan Luo; Hangwei Qian; Ivor Tsang

Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models

Bowen Ping, Chengyou Jia, Minnan Luo, Hangwei Qian, Ivor Tsang

TL;DR

Flow-Factory is introduced, a unified framework that decouples algorithms, models, and rewards through through a modular, registry-based architecture that enables seamless integration of new algorithms and architectures.

Abstract

Reinforcement learning has emerged as a promising paradigm for aligning diffusion and flow-matching models with human preferences, yet practitioners face fragmented codebases, model-specific implementations, and engineering complexity. We introduce Flow-Factory, a unified framework that decouples algorithms, models, and rewards through through a modular, registry-based architecture. This design enables seamless integration of new algorithms and architectures, as demonstrated by our support for GRPO, DiffusionNFT, and AWM across Flux, Qwen-Image, and WAN video models. By minimizing implementation overhead, Flow-Factory empowers researchers to rapidly prototype and scale future innovations with ease. Flow-Factory provides production-ready memory optimization, flexible multi-reward training, and seamless distributed training support. The codebase is available at https://github.com/X-GenGroup/Flow-Factory.

Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models

TL;DR

Abstract

Paper Structure (16 sections, 3 equations, 3 figures, 2 tables)

This paper contains 16 sections, 3 equations, 3 figures, 2 tables.

Introduction
Fragmented and Coupled Codebases.
Training Inefficiency and Memory Bottlenecks.
Limited Reward Model Flexibility.
Framework Design
Registry-Based Component Decoupling
Preprocessing-Based Memory Optimization
Multi-Reward System
Supported Algorithms
Flow-GRPO and Variants
DiffusionNFT and AWM
Experiments
Experimental Setup
Reproduction of Published Results
Training Efficiency
...and 1 more sections

Figures (3)

Figure 1: Flow-Factory architecture overview. Top: The registry-based design decouples Models, Trainers, and Rewards, enabling flexible combinations through YAML configuration. Middle: Preprocessing optimizes memory by caching embeddings and offloading frozen components. Bottom: The multi-reward system supports both pointwise and groupwise rewards with automatic deduplication and configurable advantage aggregation.
Figure 2: Reproduction of reward curves for Flow-GRPO, DiffusionNFT, and AWM on Flux.1-dev with PickScore reward.
Figure 3: Qualitative comparison of different RL algorithms on Flux.1-dev with PickScore reward. Each column shows generations for the same prompt across methods. All RL-finetuned models show improved visual quality compared to the base Flux.1-dev model.

Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models

TL;DR

Abstract

Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)