Few-shot Personalization of LLMs with Mis-aligned Responses
Jaehyung Kim, Yiming Yang
TL;DR
Fermi introduces a few-shot personalization approach for LLMs that learns user-specific prompts from a user profile $U_{\text{pro}}$ and limited opinions $U_{\text{opi}}$, while leveraging mis-aligned responses as learning signals. The method iteratively scores prompts, updates a memory of failures and successes, and generates improved prompts via a strong optimizer $\mathcal{M}_{\text{opt}}$, culminating in Retrieval-of-Prompt to select test-time prompts based on query context. Experiments across OpinionQA, GlobalOpinionQA, and LaMP tasks show that Fermi consistently surpasses strong baselines, achieving up to ~6.8% absolute improvements on QA benchmarks and notable gains on other tasks, with good cross-LLM transferability. Analyses reveal the critical roles of mis-aligned contexts, prompt memory design, and retrieval-based prompt selection, supporting the practicality and robustness of the approach for privacy-conscious, per-user LLM personalization. The work highlights practical considerations such as computational cost and the need for strong optimization models, while demonstrating meaningful impact for real-world, personalized AI systems.
Abstract
As the diversity of users increases, the capability of providing personalized responses by large language models (LLMs) has become increasingly important. Existing approaches have only limited successes in LLM personalization, due to the absence of personalized learning or the reliance on shared personal data. This paper proposes a new approach for a few-shot personalization of LLMs with their mis-aligned responses (Fermi). Our key idea is to learn a set of personalized prompts for each user by progressively improving the prompts using LLMs, based on user profile (e.g., demographic information) and a few examples of previous opinions. During an iterative process of prompt improvement, we incorporate the contexts of mis-aligned responses by LLMs, which are especially crucial for the effective personalization of LLMs. In addition, we develop an effective inference method to further leverage the context of the test query and the personalized prompts. Our experimental results demonstrate that Fermi significantly improves performance across various benchmarks, compared to best-performing baselines.
