A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models
Zhouhang Xie, Junda Wu, Yiran Shen, Yu Xia, Xintong Li, Aaron Chang, Ryan Rossi, Sachin Kumar, Bodhisattwa Prasad Majumder, Jingbo Shang, Prithviraj Ammanabrolu, Julian McAuley
TL;DR
The paper surveys personalized and pluralistic alignment for LLMs, arguing that universal alignment is insufficient and proposing a taxonomy that spans training-time, test-time, and user-modeling approaches. It formalizes the problem with per-user reward functions and discusses how shared parameters, steerable prompts, memory, and decoding-time interventions enable per-user preferences to be captured. The review covers training-time methods (user-specific adapters, heads, mixture-of-experts, and reward-models) and test-time methods (prompting, reward-guided decoding, and logit rectification), along with a discussion of data challenges and evaluation gaps. It also highlights the role of personalized user modeling in enabling and evaluating personalized alignment, and outlines future directions including online continual personalization, handling long complex value statements, and the need for standardized benchmarks to gauge progress in this rapidly evolving field.
Abstract
Personalized preference alignment for large language models (LLMs), the process of tailoring LLMs to individual users' preferences, is an emerging research direction spanning the area of NLP and personalization. In this survey, we present an analysis of works on personalized alignment and modeling for LLMs. We introduce a taxonomy of preference alignment techniques, including training time, inference time, and additionally, user-modeling based methods. We provide analysis and discussion on the strengths and limitations of each group of techniques and then cover evaluation, benchmarks, as well as open problems in the field.
