Table of Contents
Fetching ...

Few-shot Personalization of LLMs with Mis-aligned Responses

Jaehyung Kim, Yiming Yang

TL;DR

Fermi introduces a few-shot personalization approach for LLMs that learns user-specific prompts from a user profile $U_{\text{pro}}$ and limited opinions $U_{\text{opi}}$, while leveraging mis-aligned responses as learning signals. The method iteratively scores prompts, updates a memory of failures and successes, and generates improved prompts via a strong optimizer $\mathcal{M}_{\text{opt}}$, culminating in Retrieval-of-Prompt to select test-time prompts based on query context. Experiments across OpinionQA, GlobalOpinionQA, and LaMP tasks show that Fermi consistently surpasses strong baselines, achieving up to ~6.8% absolute improvements on QA benchmarks and notable gains on other tasks, with good cross-LLM transferability. Analyses reveal the critical roles of mis-aligned contexts, prompt memory design, and retrieval-based prompt selection, supporting the practicality and robustness of the approach for privacy-conscious, per-user LLM personalization. The work highlights practical considerations such as computational cost and the need for strong optimization models, while demonstrating meaningful impact for real-world, personalized AI systems.

Abstract

As the diversity of users increases, the capability of providing personalized responses by large language models (LLMs) has become increasingly important. Existing approaches have only limited successes in LLM personalization, due to the absence of personalized learning or the reliance on shared personal data. This paper proposes a new approach for a few-shot personalization of LLMs with their mis-aligned responses (Fermi). Our key idea is to learn a set of personalized prompts for each user by progressively improving the prompts using LLMs, based on user profile (e.g., demographic information) and a few examples of previous opinions. During an iterative process of prompt improvement, we incorporate the contexts of mis-aligned responses by LLMs, which are especially crucial for the effective personalization of LLMs. In addition, we develop an effective inference method to further leverage the context of the test query and the personalized prompts. Our experimental results demonstrate that Fermi significantly improves performance across various benchmarks, compared to best-performing baselines.

Few-shot Personalization of LLMs with Mis-aligned Responses

TL;DR

Fermi introduces a few-shot personalization approach for LLMs that learns user-specific prompts from a user profile and limited opinions , while leveraging mis-aligned responses as learning signals. The method iteratively scores prompts, updates a memory of failures and successes, and generates improved prompts via a strong optimizer , culminating in Retrieval-of-Prompt to select test-time prompts based on query context. Experiments across OpinionQA, GlobalOpinionQA, and LaMP tasks show that Fermi consistently surpasses strong baselines, achieving up to ~6.8% absolute improvements on QA benchmarks and notable gains on other tasks, with good cross-LLM transferability. Analyses reveal the critical roles of mis-aligned contexts, prompt memory design, and retrieval-based prompt selection, supporting the practicality and robustness of the approach for privacy-conscious, per-user LLM personalization. The work highlights practical considerations such as computational cost and the need for strong optimization models, while demonstrating meaningful impact for real-world, personalized AI systems.

Abstract

As the diversity of users increases, the capability of providing personalized responses by large language models (LLMs) has become increasingly important. Existing approaches have only limited successes in LLM personalization, due to the absence of personalized learning or the reliance on shared personal data. This paper proposes a new approach for a few-shot personalization of LLMs with their mis-aligned responses (Fermi). Our key idea is to learn a set of personalized prompts for each user by progressively improving the prompts using LLMs, based on user profile (e.g., demographic information) and a few examples of previous opinions. During an iterative process of prompt improvement, we incorporate the contexts of mis-aligned responses by LLMs, which are especially crucial for the effective personalization of LLMs. In addition, we develop an effective inference method to further leverage the context of the test query and the personalized prompts. Our experimental results demonstrate that Fermi significantly improves performance across various benchmarks, compared to best-performing baselines.
Paper Structure (27 sections, 7 equations, 20 figures, 21 tables, 1 algorithm)

This paper contains 27 sections, 7 equations, 20 figures, 21 tables, 1 algorithm.

Figures (20)

  • Figure 1: An overview of Fermi.Fermi iterates three steps to optimize the prompt from the given user information: (1) scoring new prompts, (2) updating the memory based on the score, and (3) generating new prompts (left). After the optimization, Fermi selectively uses the personalized prompts for the inference, via Retrieval-of-Prompt (right).
  • Figure 2: Prompt example. Example of input prompt for $\mathcal{M}_{\tt opt}$ to generate new prompts, composed of fixed input prompt $\text{p}_{\tt opt}$ (including fixed few-shot demonstrations) and optimization memory $M^{t}$ (Eq. \ref{['eq:new_prompt']}) on OpinionQA dataset. A full version is in Appendix \ref{['app:ours']}.
  • Figure 3: Overall topic-wise improvement. Test accuracy of ChatGPT over four different personalization methods on OpinionQA. Detailed results are presented in Appendix \ref{['app:more_results']}.
  • Figure 4: Qualitative comparison. Example prompts from All Info (middle) and Fermi (bottom) for the specific question (top) from GlobalOpinionQA. Prompt is inserted to <INS>. More examples are in Appendix \ref{['app:more_examples']}.
  • Figure 5: Optimization trajectory under different LLMs for $\mathcal{M}_{\tt opt}$. Average training accuracies on GlobalOpinionQA across optimization iterations ($T=10$) under OPRO and Fermi.
  • ...and 15 more figures