Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation
Alireza Salemi, Surya Kallumadi, Hamed Zamani
TL;DR
The paper tackles personalizing large language models without fine-tuning the LLM by optimizing the retrieval component in a retrieval-augmented generation pipeline. It introduces two retrieval-optimization strategies—reinforcement-learning based and knowledge-distillation based—leveraging downstream LLM feedback, and couples them with pre-/post-generation retrieval-model selection to handle diverse personalization needs. Empirical evaluation on the LaMP benchmark across seven tasks shows significant improvements on six tasks, with the retrieval-selection approach achieving strong performance and approaching Oracle upper bounds on several datasets. The work advances practical, privacy-preserving LLM personalization by focusing optimization on retrieval and model selection while keeping the LLM frozen, and it points to future work in prompt optimization and long-form personalization.
Abstract
This paper studies retrieval-augmented approaches for personalizing large language models (LLMs), which potentially have a substantial impact on various applications and domains. We propose the first attempt to optimize the retrieval models that deliver a limited number of personal documents to large language models for the purpose of personalized generation. We develop two optimization algorithms that solicit feedback from the downstream personalized generation tasks for retrieval optimization -- one based on reinforcement learning whose reward function is defined using any arbitrary metric for personalized generation and another based on knowledge distillation from the downstream LLM to the retrieval model. This paper also introduces a pre- and post-generation retriever selection model that decides what retriever to choose for each LLM input. Extensive experiments on diverse tasks from the language model personalization (LaMP) benchmark reveal statistically significant improvements in six out of seven datasets.
