Personalized Reward Modeling for Text-to-Image Generation
Jeongeun Lee, Ryang Heo, Dongha Lee
TL;DR
The paper tackles the problem of aligning text-to-image generation with individual user preferences. It introduces PIGReward, a LVLM-based personalized reward model that derives user-conditioned evaluation dimensions through reasoning and a self-bootstrapping strategy, and PIGBench, a per-user preference benchmark. The model comprises a preference reasoner for context construction and a reward model for structured, multi-dimensional evaluation, enabling personalized feedback for prompt optimization without user-specific training. Experiments show that PIGReward outperforms traditional similarity-based and non-personalized baselines in accuracy and interpretability, demonstrating a scalable path toward individually aligned T2I generation.
Abstract
Recent text-to-image (T2I) models generate semantically coherent images from textual prompts, yet evaluating how well they align with individual user preferences remains an open challenge. Conventional evaluation methods, general reward functions or similarity-based metrics, fail to capture the diversity and complexity of personal visual tastes. In this work, we present PIGReward, a personalized reward model that dynamically generates user-conditioned evaluation dimensions and assesses images through CoT reasoning. To address the scarcity of user data, PIGReward adopt a self-bootstrapping strategy that reasons over limited reference data to construct rich user contexts, enabling personalization without user-specific training. Beyond evaluation, PIGReward provides personalized feedback that drives user-specific prompt optimization, improving alignment between generated images and individual intent. We further introduce PIGBench, a per-user preference benchmark capturing diverse visual interpretations of shared prompts. Extensive experiments demonstrate that PIGReward surpasses existing methods in both accuracy and interpretability, establishing a scalable and reasoning-based foundation for personalized T2I evaluation and optimization. Taken together, our findings highlight PIGReward as a robust steptoward individually aligned T2I generation.
