Drift: Decoding-time Personalized Alignments with Implicit User Preferences
Minbeom Kim, Kang-il Lee, Seongho Joo, Hwaran Lee, Thibaut Thonet, Kyomin Jung
TL;DR
Drift introduces a training-free, decoding-time personalization framework that models user preferences as a weighted composition of interpretable attributes. By estimating attribute weights with a gradient-free, few-shot approach based on Bradley-Terry and zero-shot rewarding via differential prompts, Drift constructs a composite logit adjustment to steer generation without retraining the base LLM. The decoding rule tilde{\pi}(w) = softmax(h^{LLM}(w) + (1/\beta) sum_i p_i (h^i(w) - h^{base}(w))) enables efficient, interpretable personalization that scales with limited data. Experiments on Perspective and PRISM show Drift outperforms RLHF baselines with as few as 50-100 examples and demonstrates favorable inference efficiency and interpretability, suggesting practical potential for personalized AI services. Drift also highlights challenges and future directions, including online benchmarks, attribute-user mapping, biases in differential prompting, and tokenizer dependencies, framing a roadmap for implicit personalization research.
Abstract
Personalized alignments for individual users have been a long-standing goal in large language models (LLMs). We introduce Drift, a novel framework that personalizes LLMs at decoding time with implicit user preferences. Traditional Reinforcement Learning from Human Feedback (RLHF) requires thousands of annotated examples and expensive gradient updates. In contrast, Drift personalizes LLMs in a training-free manner, using only a few dozen examples to steer a frozen model through efficient preference modeling. Our approach models user preferences as a composition of predefined, interpretable attributes and aligns them at decoding time to enable personalized generation. Experiments on both a synthetic persona dataset (Perspective) and a real human-annotated dataset (PRISM) demonstrate that Drift significantly outperforms RLHF baselines while using only 50-100 examples. Our results and analysis show that Drift is both computationally efficient and interpretable.
