Table of Contents
Fetching ...

PreferThinker: Reasoning-based Personalized Image Preference Assessment

Shengqi Xu, Xinpeng Zhou, Yabo Zhang, Ming Liu, Tao Liang, Tianyu Zhang, Yalong Bai, Zuxuan Wu, Wangmeng Zuo

TL;DR

This work tackles personalized image preference assessment under limited per-user data by introducing a Visual Preference Profile that acts as a cross-user bridge. It presents PreferThinker, a predict-then-assess CoT-style framework that first predicts a user’s multi-dimensional profile from reference images and then generates interpretable, multi-dimensional scores for candidate images. The approach is supported by the PreferImg-CoT dataset and a two-stage training pipeline combining cold-start supervised fine-tuning with Group Relative Policy Optimization reinforcement learning, plus a similarity-aware reward to improve profile prediction. Empirical results on PreferImg and a real-user PickaPic benchmark show superior performance and interpretability, with strong generalization to unseen users and robustness to varying amounts of prior information, suggesting practical value for personalized image recommendations and generation.

Abstract

Personalized image preference assessment aims to evaluate an individual user's image preferences by relying only on a small set of reference images as prior information. Existing methods mainly focus on general preference assessment, training models with large-scale data to tackle well-defined tasks such as text-image alignment. However, these approaches struggle to handle personalized preference because user-specific data are scarce and not easily scalable, and individual tastes are often diverse and complex. To overcome these challenges, we introduce a common preference profile that serves as a bridge across users, allowing large-scale user data to be leveraged for training profile prediction and capturing complex personalized preferences. Building on this idea, we propose a reasoning-based personalized image preference assessment framework that follows a \textit{predict-then-assess} paradigm: it first predicts a user's preference profile from reference images, and then provides interpretable, multi-dimensional scores and assessments of candidate images based on the predicted profile. To support this, we first construct a large-scale Chain-of-Thought (CoT)-style personalized assessment dataset annotated with diverse user preference profiles and high-quality CoT-style reasoning, enabling explicit supervision of structured reasoning. Next, we adopt a two-stage training strategy: a cold-start supervised fine-tuning phase to empower the model with structured reasoning capabilities, followed by reinforcement learning to incentivize the model to explore more reasonable assessment paths and enhance generalization. Furthermore, we propose a similarity-aware prediction reward to encourage better prediction of the user's preference profile, which facilitates more reasonable assessments exploration. Extensive experiments demonstrate the superiority of the proposed method.

PreferThinker: Reasoning-based Personalized Image Preference Assessment

TL;DR

This work tackles personalized image preference assessment under limited per-user data by introducing a Visual Preference Profile that acts as a cross-user bridge. It presents PreferThinker, a predict-then-assess CoT-style framework that first predicts a user’s multi-dimensional profile from reference images and then generates interpretable, multi-dimensional scores for candidate images. The approach is supported by the PreferImg-CoT dataset and a two-stage training pipeline combining cold-start supervised fine-tuning with Group Relative Policy Optimization reinforcement learning, plus a similarity-aware reward to improve profile prediction. Empirical results on PreferImg and a real-user PickaPic benchmark show superior performance and interpretability, with strong generalization to unseen users and robustness to varying amounts of prior information, suggesting practical value for personalized image recommendations and generation.

Abstract

Personalized image preference assessment aims to evaluate an individual user's image preferences by relying only on a small set of reference images as prior information. Existing methods mainly focus on general preference assessment, training models with large-scale data to tackle well-defined tasks such as text-image alignment. However, these approaches struggle to handle personalized preference because user-specific data are scarce and not easily scalable, and individual tastes are often diverse and complex. To overcome these challenges, we introduce a common preference profile that serves as a bridge across users, allowing large-scale user data to be leveraged for training profile prediction and capturing complex personalized preferences. Building on this idea, we propose a reasoning-based personalized image preference assessment framework that follows a \textit{predict-then-assess} paradigm: it first predicts a user's preference profile from reference images, and then provides interpretable, multi-dimensional scores and assessments of candidate images based on the predicted profile. To support this, we first construct a large-scale Chain-of-Thought (CoT)-style personalized assessment dataset annotated with diverse user preference profiles and high-quality CoT-style reasoning, enabling explicit supervision of structured reasoning. Next, we adopt a two-stage training strategy: a cold-start supervised fine-tuning phase to empower the model with structured reasoning capabilities, followed by reinforcement learning to incentivize the model to explore more reasonable assessment paths and enhance generalization. Furthermore, we propose a similarity-aware prediction reward to encourage better prediction of the user's preference profile, which facilitates more reasonable assessments exploration. Extensive experiments demonstrate the superiority of the proposed method.

Paper Structure

This paper contains 32 sections, 8 equations, 29 figures, 6 tables.

Figures (29)

  • Figure 1: Illustration of challenges and motivation. (a) The general preference data is easily scalable since users share common assessment criteria, while personalized preference data for each user is typically limited and unscalable, as each user's preferences are distinct. Besides, general preferences are often clear (e.g. text-image alignment and aesthetics), while personalized preferences are typically complex and diverse. (b) We propose a preference profile comprising multiple common visual elements, based on the observation that although each user's personalized preferences are unique, the key visual elements that shape them are shared and can therefore serve as a bridge to connect users.
  • Figure 2: Examples of PreferThinker for personalized image preference assessment. In the think stage, red text denotes alignment with preference profiles, while blue text denotes alignment with non-preference profiles. See Appendix \ref{['sec:supp_visualization']} for more complete reasoning examples.
  • Figure 3: Key visual elements of preference profile. (a) User study result reveals that color, art style, art medium, saturation, and detail are voted five key elements representing the visual preference profile. (b) World clouds show that each element has a rich vocabulary associated with it.
  • Figure 4: Illustration of the proposed dataset PreferImg-CoT. (a) Personalized preference data generation pipeline. (b) Overview of CoT-style dataset construction: Claude annotation and data filtering. (c) Prompt design for Claude 3.7 to generate CoT-style response. (d) CoT-style response template, including preference profiles prediction, multi-dimensional assessment and answer.
  • Figure 5: Illustration of training strategy and proposed prediction reward: (a) Cold-start SFT to teach structured reasoning; (b) RL-based post-training to explore more reasonable assessments and enhance model generalization. (c) Similarity-aware prediction reward for better preference profile prediction.
  • ...and 24 more figures