PreferThinker: Reasoning-based Personalized Image Preference Assessment
Shengqi Xu, Xinpeng Zhou, Yabo Zhang, Ming Liu, Tao Liang, Tianyu Zhang, Yalong Bai, Zuxuan Wu, Wangmeng Zuo
TL;DR
This work tackles personalized image preference assessment under limited per-user data by introducing a Visual Preference Profile that acts as a cross-user bridge. It presents PreferThinker, a predict-then-assess CoT-style framework that first predicts a user’s multi-dimensional profile from reference images and then generates interpretable, multi-dimensional scores for candidate images. The approach is supported by the PreferImg-CoT dataset and a two-stage training pipeline combining cold-start supervised fine-tuning with Group Relative Policy Optimization reinforcement learning, plus a similarity-aware reward to improve profile prediction. Empirical results on PreferImg and a real-user PickaPic benchmark show superior performance and interpretability, with strong generalization to unseen users and robustness to varying amounts of prior information, suggesting practical value for personalized image recommendations and generation.
Abstract
Personalized image preference assessment aims to evaluate an individual user's image preferences by relying only on a small set of reference images as prior information. Existing methods mainly focus on general preference assessment, training models with large-scale data to tackle well-defined tasks such as text-image alignment. However, these approaches struggle to handle personalized preference because user-specific data are scarce and not easily scalable, and individual tastes are often diverse and complex. To overcome these challenges, we introduce a common preference profile that serves as a bridge across users, allowing large-scale user data to be leveraged for training profile prediction and capturing complex personalized preferences. Building on this idea, we propose a reasoning-based personalized image preference assessment framework that follows a \textit{predict-then-assess} paradigm: it first predicts a user's preference profile from reference images, and then provides interpretable, multi-dimensional scores and assessments of candidate images based on the predicted profile. To support this, we first construct a large-scale Chain-of-Thought (CoT)-style personalized assessment dataset annotated with diverse user preference profiles and high-quality CoT-style reasoning, enabling explicit supervision of structured reasoning. Next, we adopt a two-stage training strategy: a cold-start supervised fine-tuning phase to empower the model with structured reasoning capabilities, followed by reinforcement learning to incentivize the model to explore more reasonable assessment paths and enhance generalization. Furthermore, we propose a similarity-aware prediction reward to encourage better prediction of the user's preference profile, which facilitates more reasonable assessments exploration. Extensive experiments demonstrate the superiority of the proposed method.
