Table of Contents
Fetching ...

PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards

Shulei Wang, Longhui Wei, Xin He, Jianbo Ouyang, Hui Lu, Zhou Zhao, Qi Tian

TL;DR

PSR tackles the challenge of multi-subject image generation by introducing a scalable data-generation pipeline that leverages strong single-subject personalization models, a frame-wise positional encoding scheme, and a novel Pairwise Subject-Consistency Reward to improve subject fidelity and text controllability. The approach is complemented by PSRBench, a fine-grained benchmark assessing subject consistency, semantic alignment, and aesthetic quality across seven sub-tasks. Empirical results show state-of-the-art performance on PSRBench and competitive performance on DreamBench, validating scalability to up to four subjects and robustness across complex prompts. These contributions offer a practical and scalable path toward reliable multi-subject personalized generation with precise textual guidance.

Abstract

Personalized generation models for a single subject have demonstrated remarkable effectiveness, highlighting their significant potential. However, when extended to multiple subjects, existing models often exhibit degraded performance, particularly in maintaining subject consistency and adhering to textual prompts. We attribute these limitations to the absence of high-quality multi-subject datasets and refined post-training strategies. To address these challenges, we propose a scalable multi-subject data generation pipeline that leverages powerful single-subject generation models to construct diverse and high-quality multi-subject training data. Through this dataset, we first enable single-subject personalization models to acquire knowledge of synthesizing multi-image and multi-subject scenarios. Furthermore, to enhance both subject consistency and text controllability, we design a set of Pairwise Subject-Consistency Rewards and general-purpose rewards, which are incorporated into a refined reinforcement learning stage. To comprehensively evaluate multi-subject personalization, we introduce a new benchmark that assesses model performance using seven subsets across three dimensions. Extensive experiments demonstrate the effectiveness of our approach in advancing multi-subject personalized image generation. Github Link: https://github.com/wang-shulei/PSR

PSR: Scaling Multi-Subject Personalized Image Generation with Pairwise Subject-Consistency Rewards

TL;DR

PSR tackles the challenge of multi-subject image generation by introducing a scalable data-generation pipeline that leverages strong single-subject personalization models, a frame-wise positional encoding scheme, and a novel Pairwise Subject-Consistency Reward to improve subject fidelity and text controllability. The approach is complemented by PSRBench, a fine-grained benchmark assessing subject consistency, semantic alignment, and aesthetic quality across seven sub-tasks. Empirical results show state-of-the-art performance on PSRBench and competitive performance on DreamBench, validating scalability to up to four subjects and robustness across complex prompts. These contributions offer a practical and scalable path toward reliable multi-subject personalized generation with precise textual guidance.

Abstract

Personalized generation models for a single subject have demonstrated remarkable effectiveness, highlighting their significant potential. However, when extended to multiple subjects, existing models often exhibit degraded performance, particularly in maintaining subject consistency and adhering to textual prompts. We attribute these limitations to the absence of high-quality multi-subject datasets and refined post-training strategies. To address these challenges, we propose a scalable multi-subject data generation pipeline that leverages powerful single-subject generation models to construct diverse and high-quality multi-subject training data. Through this dataset, we first enable single-subject personalization models to acquire knowledge of synthesizing multi-image and multi-subject scenarios. Furthermore, to enhance both subject consistency and text controllability, we design a set of Pairwise Subject-Consistency Rewards and general-purpose rewards, which are incorporated into a refined reinforcement learning stage. To comprehensively evaluate multi-subject personalization, we introduce a new benchmark that assesses model performance using seven subsets across three dimensions. Extensive experiments demonstrate the effectiveness of our approach in advancing multi-subject personalized image generation. Github Link: https://github.com/wang-shulei/PSR

Paper Structure

This paper contains 26 sections, 8 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Quantitative comparison of recent methods on PSRBench across three evaluation dimensions: subject consistency, aesthetic preference, and semantic alignment. Our method consistently outperforms all baselines across all seven subsets.
  • Figure 2: Overview of the dataset construction pipeline. The process consists of two stages: (1) multi-subject paired image generation, where LLM-guided prompts are used to generate paired subject-aware images via T2I and personalized models, and (2) multi-subject paired instruction generation, which edits image attributes or actions and produces corresponding textual instructions. The final dataset contains diverse paired samples with consistent multi-subject semantics.
  • Figure 3: Left: Scalable frame-wise positional encoding. Middle: Pairwise subject-consistency rewards. Right: GRPO training pipeline combining PSR with multiple rewards.
  • Figure 4: Qualitative analysis results of PSR with recent state-of-the-art models.
  • Figure 5: Overview of PSRBench, with a case from each subset shown on the left and the three evaluation dimensions for each subset displayed on the right.
  • ...and 5 more figures