Table of Contents
Fetching ...

Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models

Chris Richardson, Yao Zhang, Kellen Gillespie, Sudipta Kar, Arshdeep Singh, Zeynab Raeesy, Omar Zia Khan, Abhinav Sethy

TL;DR

The paper tackles personalization in NLP under input-length and latency constraints by combining offline, task-aware LLM-generated user summaries with runtime retrieval. It formalizes the objective as p(y|x,u) and introduces a summary-augmented retrieval framework where an offline summary s_u complements retrieved data to form prompts for a downstream model. On the LaMP benchmark, this approach achieves comparable or superior performance to retrieval-only methods while reducing retrieved data by ~75%, and can even excel with no retrieval in data-sparse settings. The method offers practical deployment benefits for runtime-constrained systems like voice assistants by precomputing user summaries offline and leveraging task-aware prompts for personalization.

Abstract

Personalization, the ability to tailor a system to individual users, is an essential factor in user experience with natural language processing (NLP) systems. With the emergence of Large Language Models (LLMs), a key question is how to leverage these models to better personalize user experiences. To personalize a language model's output, a straightforward approach is to incorporate past user data into the language model prompt, but this approach can result in lengthy inputs exceeding limitations on input length and incurring latency and cost issues. Existing approaches tackle such challenges by selectively extracting relevant user data (i.e. selective retrieval) to construct a prompt for downstream tasks. However, retrieval-based methods are limited by potential information loss, lack of more profound user understanding, and cold-start challenges. To overcome these limitations, we propose a novel summary-augmented approach by extending retrieval-augmented personalization with task-aware user summaries generated by LLMs. The summaries can be generated and stored offline, enabling real-world systems with runtime constraints like voice assistants to leverage the power of LLMs. Experiments show our method with 75% less of retrieved user data is on-par or outperforms retrieval augmentation on most tasks in the LaMP personalization benchmark. We demonstrate that offline summarization via LLMs and runtime retrieval enables better performance for personalization on a range of tasks under practical constraints.

Integrating Summarization and Retrieval for Enhanced Personalization via Large Language Models

TL;DR

The paper tackles personalization in NLP under input-length and latency constraints by combining offline, task-aware LLM-generated user summaries with runtime retrieval. It formalizes the objective as p(y|x,u) and introduces a summary-augmented retrieval framework where an offline summary s_u complements retrieved data to form prompts for a downstream model. On the LaMP benchmark, this approach achieves comparable or superior performance to retrieval-only methods while reducing retrieved data by ~75%, and can even excel with no retrieval in data-sparse settings. The method offers practical deployment benefits for runtime-constrained systems like voice assistants by precomputing user summaries offline and leveraging task-aware prompts for personalization.

Abstract

Personalization, the ability to tailor a system to individual users, is an essential factor in user experience with natural language processing (NLP) systems. With the emergence of Large Language Models (LLMs), a key question is how to leverage these models to better personalize user experiences. To personalize a language model's output, a straightforward approach is to incorporate past user data into the language model prompt, but this approach can result in lengthy inputs exceeding limitations on input length and incurring latency and cost issues. Existing approaches tackle such challenges by selectively extracting relevant user data (i.e. selective retrieval) to construct a prompt for downstream tasks. However, retrieval-based methods are limited by potential information loss, lack of more profound user understanding, and cold-start challenges. To overcome these limitations, we propose a novel summary-augmented approach by extending retrieval-augmented personalization with task-aware user summaries generated by LLMs. The summaries can be generated and stored offline, enabling real-world systems with runtime constraints like voice assistants to leverage the power of LLMs. Experiments show our method with 75% less of retrieved user data is on-par or outperforms retrieval augmentation on most tasks in the LaMP personalization benchmark. We demonstrate that offline summarization via LLMs and runtime retrieval enables better performance for personalization on a range of tasks under practical constraints.
Paper Structure (13 sections, 3 equations, 1 figure, 3 tables)

This paper contains 13 sections, 3 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Personalization is achieved by combining runtime-retrieved samples with an offline-generated user summary. Given a textual input $x$ that describes a task in natural language, the goal is to generate a personalized output $y$ for users. The retrieval model identifies the most relevant items from user data, and the retrieved items along with the offline user summary and $x$ form the basis for creating a prompt. This prompt is constructed using a prompt construction function $\phi_p$.