Table of Contents
Fetching ...

Online Personalizing White-box LLMs Generation with Neural Bandits

Zekai Chen, Weeden Daniel, Po-yu Chen, Francois Buet-Golfouse

TL;DR

The work addresses scalable personalization of open-ended LLM generation by online optimization of soft instruction embeddings using neural bandits. By directly updating soft prompts with NeuralUCB and NeuralTS in a white-box LLM, the approach achieves notable improvements on the LaMP benchmark across multiple personalized tasks, including substantial ROUGE gains and favorable LLM-evaluator trends. This demonstrates a practical path to per-user customization without training separate models, though the authors acknowledge limits in evaluation scope and the need for human judgments. The framework advances personalized generation and highlights ethical considerations around privacy, bias, and potential misuse, suggesting further study across diverse tasks and real-world user feedback.

Abstract

The advent of personalized content generation by LLMs presents a novel challenge: how to efficiently adapt text to meet individual preferences without the unsustainable demand of creating a unique model for each user. This study introduces an innovative online method that employs neural bandit algorithms to dynamically optimize soft instruction embeddings based on user feedback, enhancing the personalization of open-ended text generation by white-box LLMs. Through rigorous experimentation on various tasks, we demonstrate significant performance improvements over baseline strategies. NeuralTS, in particular, leads to substantial enhancements in personalized news headline generation, achieving up to a 62.9% improvement in terms of best ROUGE scores and up to 2.76% increase in LLM-agent evaluation against the baseline.

Online Personalizing White-box LLMs Generation with Neural Bandits

TL;DR

The work addresses scalable personalization of open-ended LLM generation by online optimization of soft instruction embeddings using neural bandits. By directly updating soft prompts with NeuralUCB and NeuralTS in a white-box LLM, the approach achieves notable improvements on the LaMP benchmark across multiple personalized tasks, including substantial ROUGE gains and favorable LLM-evaluator trends. This demonstrates a practical path to per-user customization without training separate models, though the authors acknowledge limits in evaluation scope and the need for human judgments. The framework advances personalized generation and highlights ethical considerations around privacy, bias, and potential misuse, suggesting further study across diverse tasks and real-world user feedback.

Abstract

The advent of personalized content generation by LLMs presents a novel challenge: how to efficiently adapt text to meet individual preferences without the unsustainable demand of creating a unique model for each user. This study introduces an innovative online method that employs neural bandit algorithms to dynamically optimize soft instruction embeddings based on user feedback, enhancing the personalization of open-ended text generation by white-box LLMs. Through rigorous experimentation on various tasks, we demonstrate significant performance improvements over baseline strategies. NeuralTS, in particular, leads to substantial enhancements in personalized news headline generation, achieving up to a 62.9% improvement in terms of best ROUGE scores and up to 2.76% increase in LLM-agent evaluation against the baseline.
Paper Structure (12 sections, 4 equations, 5 figures, 1 table)

This paper contains 12 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Evolution of generated headlines for an article on teen internet safety, illustrating the progressive refinement of generation that emulates this journalist stylistic tendencies through online learning.
  • Figure 2: Illustration of our framework. Details are described in Section \ref{['sec:method']}.
  • Figure 3: 10 user profiles (different blues) are randomly selected for demonstration. Trend of increasing averaged best rewards (yellow dashes) across learning iterations for three personalized text generation tasks, showcasing the progressive improvement in performance achieved by both NeuralUCB Zhou2019NeuralCB and NeuralTS zhang2021neural algorithms.
  • Figure 4: LLM evaluation of personalized generation between NeuralUCB Zhou2019NeuralCB and NeuralTS zhang2021neural in personalized news headline generation.
  • Figure 5: Using personalized news headline generation as an example. Prompts fed to the black-box LLMs for human-like evaluation of the generation by white-box LLM.