When Personalization Meets Reality: A Multi-Faceted Analysis of Personalized Preference Learning
Yijiang River Dong, Tiancheng Hu, Yinhong Liu, Ahmet Üstün, Nigel Collier
TL;DR
This work addresses the limitation of RLHF that assumes homogeneous user preferences by proposing a multi-faceted evaluation framework for personalized preference learning in open-domain LLMs. It analyzes eight personalization methods across three general-domain datasets, emphasizing dataset properties such as inter-personal disagreement, intra-personal consistency, minority viewpoints, and room for personalization. Key findings show that personalized reward modeling can outperform baselines, yet personalization can cause up to $20\%$ safety misalignment and even degrade core reasoning in some cases, underscoring a personalization tax. The study also highlights the efficacy of meta-learning approaches like GPO for adapting to new users and cautions that holistic evaluation is essential to develop inclusive, safe, and robust personalized systems for diverse global users.
Abstract
While Reinforcement Learning from Human Feedback (RLHF) is widely used to align Large Language Models (LLMs) with human preferences, it typically assumes homogeneous preferences across users, overlooking diverse human values and minority viewpoints. Although personalized preference learning addresses this by tailoring separate preferences for individual users, the field lacks standardized methods to assess its effectiveness. We present a multi-faceted evaluation framework that measures not only performance but also fairness, unintended effects, and adaptability across varying levels of preference divergence. Through extensive experiments comparing eight personalization methods across three preference datasets, we demonstrate that performance differences between methods could reach 36% when users strongly disagree, and personalization can introduce up to 20% safety misalignment. These findings highlight the critical need for holistic evaluation approaches to advance the development of more effective and inclusive preference learning systems.
