Online Personalizing White-box LLMs Generation with Neural Bandits
Zekai Chen, Weeden Daniel, Po-yu Chen, Francois Buet-Golfouse
TL;DR
The work addresses scalable personalization of open-ended LLM generation by online optimization of soft instruction embeddings using neural bandits. By directly updating soft prompts with NeuralUCB and NeuralTS in a white-box LLM, the approach achieves notable improvements on the LaMP benchmark across multiple personalized tasks, including substantial ROUGE gains and favorable LLM-evaluator trends. This demonstrates a practical path to per-user customization without training separate models, though the authors acknowledge limits in evaluation scope and the need for human judgments. The framework advances personalized generation and highlights ethical considerations around privacy, bias, and potential misuse, suggesting further study across diverse tasks and real-world user feedback.
Abstract
The advent of personalized content generation by LLMs presents a novel challenge: how to efficiently adapt text to meet individual preferences without the unsustainable demand of creating a unique model for each user. This study introduces an innovative online method that employs neural bandit algorithms to dynamically optimize soft instruction embeddings based on user feedback, enhancing the personalization of open-ended text generation by white-box LLMs. Through rigorous experimentation on various tasks, we demonstrate significant performance improvements over baseline strategies. NeuralTS, in particular, leads to substantial enhancements in personalized news headline generation, achieving up to a 62.9% improvement in terms of best ROUGE scores and up to 2.76% increase in LLM-agent evaluation against the baseline.
