Orchestrating LLMs with Different Personalizations
Jin Peng Zhou, Katie Z Luo, Jingwen Gu, Jason Yuan, Kilian Q. Weinberger, Wen Sun
TL;DR
This work tackles personalization of large language models to individual user preferences without retraining. It introduces Mixture of Preference Experts (MoPE), a black-box, token-level output-merging framework where a lightweight Preference Control Model (PCM) assigns per-token weights to combine the next-token distributions of frozen expert LLMs. Rewards for each preference dimension are modeled with Bradley-Terry normalization and optimized via online reinforcement learning (REBEL) to maximize multi-dimensional utility. Empirical results on the Koala dataset show MoPE achieves state-of-the-art performance compared with prompting and weight-merging baselines, offering scalable and practical personalization for proprietary or closed models.
Abstract
This paper presents a novel approach to aligning large language models (LLMs) with individual human preferences, sometimes referred to as Reinforcement Learning from \textit{Personalized} Human Feedback (RLPHF). Given stated preferences along multiple dimensions, such as helpfulness, conciseness, or humor, the goal is to create an LLM without re-training that best adheres to this specification. Starting from specialized expert LLMs, each trained for one such particular preference dimension, we propose a black-box method that merges their outputs on a per-token level. We train a lightweight Preference Control Model (PCM) that dynamically translates the preference description and current context into next-token prediction weights. By combining the expert models' outputs at the token level, our approach dynamically generates text that optimizes the given preference. Empirical tests show that our method matches or surpasses existing preference merging techniques, providing a scalable, efficient alternative to fine-tuning LLMs for individual personalization.
