Table of Contents
Fetching ...

Personalized Reward Modeling for Text-to-Image Generation

Jeongeun Lee, Ryang Heo, Dongha Lee

TL;DR

The paper tackles the problem of aligning text-to-image generation with individual user preferences. It introduces PIGReward, a LVLM-based personalized reward model that derives user-conditioned evaluation dimensions through reasoning and a self-bootstrapping strategy, and PIGBench, a per-user preference benchmark. The model comprises a preference reasoner for context construction and a reward model for structured, multi-dimensional evaluation, enabling personalized feedback for prompt optimization without user-specific training. Experiments show that PIGReward outperforms traditional similarity-based and non-personalized baselines in accuracy and interpretability, demonstrating a scalable path toward individually aligned T2I generation.

Abstract

Recent text-to-image (T2I) models generate semantically coherent images from textual prompts, yet evaluating how well they align with individual user preferences remains an open challenge. Conventional evaluation methods, general reward functions or similarity-based metrics, fail to capture the diversity and complexity of personal visual tastes. In this work, we present PIGReward, a personalized reward model that dynamically generates user-conditioned evaluation dimensions and assesses images through CoT reasoning. To address the scarcity of user data, PIGReward adopt a self-bootstrapping strategy that reasons over limited reference data to construct rich user contexts, enabling personalization without user-specific training. Beyond evaluation, PIGReward provides personalized feedback that drives user-specific prompt optimization, improving alignment between generated images and individual intent. We further introduce PIGBench, a per-user preference benchmark capturing diverse visual interpretations of shared prompts. Extensive experiments demonstrate that PIGReward surpasses existing methods in both accuracy and interpretability, establishing a scalable and reasoning-based foundation for personalized T2I evaluation and optimization. Taken together, our findings highlight PIGReward as a robust steptoward individually aligned T2I generation.

Personalized Reward Modeling for Text-to-Image Generation

TL;DR

The paper tackles the problem of aligning text-to-image generation with individual user preferences. It introduces PIGReward, a LVLM-based personalized reward model that derives user-conditioned evaluation dimensions through reasoning and a self-bootstrapping strategy, and PIGBench, a per-user preference benchmark. The model comprises a preference reasoner for context construction and a reward model for structured, multi-dimensional evaluation, enabling personalized feedback for prompt optimization without user-specific training. Experiments show that PIGReward outperforms traditional similarity-based and non-personalized baselines in accuracy and interpretability, demonstrating a scalable path toward individually aligned T2I generation.

Abstract

Recent text-to-image (T2I) models generate semantically coherent images from textual prompts, yet evaluating how well they align with individual user preferences remains an open challenge. Conventional evaluation methods, general reward functions or similarity-based metrics, fail to capture the diversity and complexity of personal visual tastes. In this work, we present PIGReward, a personalized reward model that dynamically generates user-conditioned evaluation dimensions and assesses images through CoT reasoning. To address the scarcity of user data, PIGReward adopt a self-bootstrapping strategy that reasons over limited reference data to construct rich user contexts, enabling personalization without user-specific training. Beyond evaluation, PIGReward provides personalized feedback that drives user-specific prompt optimization, improving alignment between generated images and individual intent. We further introduce PIGBench, a per-user preference benchmark capturing diverse visual interpretations of shared prompts. Extensive experiments demonstrate that PIGReward surpasses existing methods in both accuracy and interpretability, establishing a scalable and reasoning-based foundation for personalized T2I evaluation and optimization. Taken together, our findings highlight PIGReward as a robust steptoward individually aligned T2I generation.

Paper Structure

This paper contains 14 sections, 7 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Each image in the pair is evaluated across multiple dimensions per user. Since two users prioritize different criteria, their overall judgments of which image is better differs.
  • Figure 2: Inference process in PIGReward. The preference reasoner $\pi$ first builds personalized context, then reward model $\phi$ performs context-aware evaluation of the target image pair.
  • Figure 3: Training process for the preference reasoner $\pi$. We fine-tune $\pi$ with DPO by generating contrastive rationale pairs, enabling to produce multifaceted explanations for user preferences.
  • Figure 4: Training process for the reward model $\phi$. (a) We generate CoT-formatted data with GPT-4o and discard samples with any invalid or incorrect ones. (b) The resulting high-quality dataset is used to supervise $\phi$, distilling personalized reasoning process.
  • Figure 5: An overview of personalized prompt optimization framework. Here, PIGReward offers user-conditioned guidance to refine the prompt model by identifying preferred–rejected pairs.
  • ...and 6 more figures