Table of Contents
Fetching ...

Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models

Bowen Ping, Chengyou Jia, Minnan Luo, Hangwei Qian, Ivor Tsang

TL;DR

Flow-Factory is introduced, a unified framework that decouples algorithms, models, and rewards through through a modular, registry-based architecture that enables seamless integration of new algorithms and architectures.

Abstract

Reinforcement learning has emerged as a promising paradigm for aligning diffusion and flow-matching models with human preferences, yet practitioners face fragmented codebases, model-specific implementations, and engineering complexity. We introduce Flow-Factory, a unified framework that decouples algorithms, models, and rewards through through a modular, registry-based architecture. This design enables seamless integration of new algorithms and architectures, as demonstrated by our support for GRPO, DiffusionNFT, and AWM across Flux, Qwen-Image, and WAN video models. By minimizing implementation overhead, Flow-Factory empowers researchers to rapidly prototype and scale future innovations with ease. Flow-Factory provides production-ready memory optimization, flexible multi-reward training, and seamless distributed training support. The codebase is available at https://github.com/X-GenGroup/Flow-Factory.

Flow-Factory: A Unified Framework for Reinforcement Learning in Flow-Matching Models

TL;DR

Flow-Factory is introduced, a unified framework that decouples algorithms, models, and rewards through through a modular, registry-based architecture that enables seamless integration of new algorithms and architectures.

Abstract

Reinforcement learning has emerged as a promising paradigm for aligning diffusion and flow-matching models with human preferences, yet practitioners face fragmented codebases, model-specific implementations, and engineering complexity. We introduce Flow-Factory, a unified framework that decouples algorithms, models, and rewards through through a modular, registry-based architecture. This design enables seamless integration of new algorithms and architectures, as demonstrated by our support for GRPO, DiffusionNFT, and AWM across Flux, Qwen-Image, and WAN video models. By minimizing implementation overhead, Flow-Factory empowers researchers to rapidly prototype and scale future innovations with ease. Flow-Factory provides production-ready memory optimization, flexible multi-reward training, and seamless distributed training support. The codebase is available at https://github.com/X-GenGroup/Flow-Factory.
Paper Structure (16 sections, 3 equations, 3 figures, 2 tables)

This paper contains 16 sections, 3 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Flow-Factory architecture overview. Top: The registry-based design decouples Models, Trainers, and Rewards, enabling flexible combinations through YAML configuration. Middle: Preprocessing optimizes memory by caching embeddings and offloading frozen components. Bottom: The multi-reward system supports both pointwise and groupwise rewards with automatic deduplication and configurable advantage aggregation.
  • Figure 2: Reproduction of reward curves for Flow-GRPO, DiffusionNFT, and AWM on Flux.1-dev with PickScore reward.
  • Figure 3: Qualitative comparison of different RL algorithms on Flux.1-dev with PickScore reward. Each column shows generations for the same prompt across methods. All RL-finetuned models show improved visual quality compared to the base Flux.1-dev model.