OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

Yuan Gong; Xionghui Wang; Jie Wu; Shiyin Wang; Yitong Wang; Xinglong Wu

OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

Yuan Gong, Xionghui Wang, Jie Wu, Shiyin Wang, Yitong Wang, Xinglong Wu

TL;DR

OneReward addresses the challenge of unifying multi-task mask-guided image editing under diverse evaluation criteria by using a single Vision-Language Reward Model to guide reinforcement learning. The authors introduce Seedream 3.0 Fill, a multi-task RLHF-based model trained directly on a pre-trained base without task-specific SFT, achieving state-of-the-art results across image fill, extend, removal, and text rendering. They also present a dynamic reinforcement learning variant and open-source FLUX Fill [OneReward], demonstrating robust generalization and practical benefits for unified image editing. The work highlights the potential of unified reward modeling to streamline training and improve cross-task performance in diffusion/flow matching settings.

Abstract

In this paper, we introduce OneReward, a unified reinforcement learning framework that enhances the model's generative capabilities across multiple tasks under different evaluation criteria using only \textit{One Reward} model. By employing a single vision-language model (VLM) as the generative reward model, which can distinguish the winner and loser for a given task and a given evaluation criterion, it can be effectively applied to multi-task generation models, particularly in contexts with varied data and diverse task objectives. We utilize OneReward for mask-guided image generation, which can be further divided into several sub-tasks such as image fill, image extend, object removal, and text rendering, involving a binary mask as the edit area. Although these domain-specific tasks share same conditioning paradigm, they differ significantly in underlying data distributions and evaluation metrics. Existing methods often rely on task-specific supervised fine-tuning (SFT), which limits generalization and training efficiency. Building on OneReward, we develop Seedream 3.0 Fill, a mask-guided generation model trained via multi-task reinforcement learning directly on a pre-trained base model, eliminating the need for task-specific SFT. Experimental results demonstrate that our unified edit model consistently outperforms both commercial and open-source competitors, such as Ideogram, Adobe Photoshop, and FLUX Fill [Pro], across multiple evaluation dimensions. Code and model are available at: https://one-reward.github.io

OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

TL;DR

Abstract

OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)