Table of Contents
Fetching ...

Drift: Decoding-time Personalized Alignments with Implicit User Preferences

Minbeom Kim, Kang-il Lee, Seongho Joo, Hwaran Lee, Thibaut Thonet, Kyomin Jung

TL;DR

Drift introduces a training-free, decoding-time personalization framework that models user preferences as a weighted composition of interpretable attributes. By estimating attribute weights with a gradient-free, few-shot approach based on Bradley-Terry and zero-shot rewarding via differential prompts, Drift constructs a composite logit adjustment to steer generation without retraining the base LLM. The decoding rule tilde{\pi}(w) = softmax(h^{LLM}(w) + (1/\beta) sum_i p_i (h^i(w) - h^{base}(w))) enables efficient, interpretable personalization that scales with limited data. Experiments on Perspective and PRISM show Drift outperforms RLHF baselines with as few as 50-100 examples and demonstrates favorable inference efficiency and interpretability, suggesting practical potential for personalized AI services. Drift also highlights challenges and future directions, including online benchmarks, attribute-user mapping, biases in differential prompting, and tokenizer dependencies, framing a roadmap for implicit personalization research.

Abstract

Personalized alignments for individual users have been a long-standing goal in large language models (LLMs). We introduce Drift, a novel framework that personalizes LLMs at decoding time with implicit user preferences. Traditional Reinforcement Learning from Human Feedback (RLHF) requires thousands of annotated examples and expensive gradient updates. In contrast, Drift personalizes LLMs in a training-free manner, using only a few dozen examples to steer a frozen model through efficient preference modeling. Our approach models user preferences as a composition of predefined, interpretable attributes and aligns them at decoding time to enable personalized generation. Experiments on both a synthetic persona dataset (Perspective) and a real human-annotated dataset (PRISM) demonstrate that Drift significantly outperforms RLHF baselines while using only 50-100 examples. Our results and analysis show that Drift is both computationally efficient and interpretable.

Drift: Decoding-time Personalized Alignments with Implicit User Preferences

TL;DR

Drift introduces a training-free, decoding-time personalization framework that models user preferences as a weighted composition of interpretable attributes. By estimating attribute weights with a gradient-free, few-shot approach based on Bradley-Terry and zero-shot rewarding via differential prompts, Drift constructs a composite logit adjustment to steer generation without retraining the base LLM. The decoding rule tilde{\pi}(w) = softmax(h^{LLM}(w) + (1/\beta) sum_i p_i (h^i(w) - h^{base}(w))) enables efficient, interpretable personalization that scales with limited data. Experiments on Perspective and PRISM show Drift outperforms RLHF baselines with as few as 50-100 examples and demonstrates favorable inference efficiency and interpretability, suggesting practical potential for personalized AI services. Drift also highlights challenges and future directions, including online benchmarks, attribute-user mapping, biases in differential prompting, and tokenizer dependencies, framing a roadmap for implicit personalization research.

Abstract

Personalized alignments for individual users have been a long-standing goal in large language models (LLMs). We introduce Drift, a novel framework that personalizes LLMs at decoding time with implicit user preferences. Traditional Reinforcement Learning from Human Feedback (RLHF) requires thousands of annotated examples and expensive gradient updates. In contrast, Drift personalizes LLMs in a training-free manner, using only a few dozen examples to steer a frozen model through efficient preference modeling. Our approach models user preferences as a composition of predefined, interpretable attributes and aligns them at decoding time to enable personalized generation. Experiments on both a synthetic persona dataset (Perspective) and a real human-annotated dataset (PRISM) demonstrate that Drift significantly outperforms RLHF baselines while using only 50-100 examples. Our results and analysis show that Drift is both computationally efficient and interpretable.

Paper Structure

This paper contains 57 sections, 34 equations, 5 figures, 10 tables, 2 algorithms.

Figures (5)

  • Figure 1: Overview of the total Drift Algorithms. (a) Drift Approximation: Decomposes a user’s implicit preferences into a weighted combination of various attributes. (b) Drift Decoding: Integrates this attribute composition into the decoding process without retraining the LLM.
  • Figure 2: Average $k$-shot preference modeling results across personas in the Perspective and PRISM datasets. The two figures on the left show the results for Perspective using Llama 1B and Gemma 2B; the two on the right for PRISM using Llama 1B and Gemma 2B.
  • Figure 3: Performance variation when reducing the number of attributes during Drift Approximation with 40 samples. The performance decline is slightly more pronounced in the PRISM dataset, suggesting that real users’ implicit preferences are more complex than those of synthetic personas.
  • Figure 4: Few-shot preference modeling results for user1008 in the PRISM with quadratic programming (QP) and logistic regression (LQ).
  • Figure 5: For each user in PRISM, there is a $W-L$ (Win-Loss) value for each attribute. The higher this value is, the more that user can be interpreted as preferring that attribute.