Table of Contents
Fetching ...

Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data Refinement

Chenkai Sun, Ke Yang, Revanth Gangi Reddy, Yi R. Fung, Hou Pong Chan, Kevin Small, ChengXiang Zhai, Heng Ji

TL;DR

Persona-DB tackles efficient LLM personalization by learning structured, generalizable user personas through hierarchical refinement and bridging gaps with collaborative refinement. The framework distills histories into high-level DP/IP constructs and joins knowledge from similar users via a cosine-similarity-based retrieval of a collaborative database, guided by a composition ratio $x$. Empirical results on RFPN and OpinionQA show improved correlation and F1/accuracy with reduced retrieval sizes, with pronounced gains in cold-start scenarios and as retrieval capacity grows. This approach enables accurate, context-efficient personalization for users with sparse histories or extensive interaction histories, highlighting the growing value of collaborative knowledge in retrieval-augmented personalization.

Abstract

The increasing demand for personalized interactions with large language models (LLMs) calls for methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on enhancing the retrieval stage and devoted limited exploration toward optimizing the representation of the database, a crucial aspect for tasks such as personalization. In this work, we examine the problem from a novel angle, focusing on how data can be better represented for more data-efficient retrieval in the context of LLM customization. To tackle this challenge, we introduce Persona-DB, a simple yet effective framework consisting of a hierarchical construction process to improve generalization across task contexts and collaborative refinement to effectively bridge knowledge gaps among users. In the evaluation of response prediction, Persona-DB demonstrates superior context efficiency in maintaining accuracy with a significantly reduced retrieval size, a critical advantage in scenarios with extensive histories or limited context windows. Our experiments also indicate a marked improvement of over 10% under cold-start scenarios, when users have extremely sparse data. Furthermore, our analysis reveals the increasing importance of collaborative knowledge as the retrieval capacity expands.

Persona-DB: Efficient Large Language Model Personalization for Response Prediction with Collaborative Data Refinement

TL;DR

Persona-DB tackles efficient LLM personalization by learning structured, generalizable user personas through hierarchical refinement and bridging gaps with collaborative refinement. The framework distills histories into high-level DP/IP constructs and joins knowledge from similar users via a cosine-similarity-based retrieval of a collaborative database, guided by a composition ratio . Empirical results on RFPN and OpinionQA show improved correlation and F1/accuracy with reduced retrieval sizes, with pronounced gains in cold-start scenarios and as retrieval capacity grows. This approach enables accurate, context-efficient personalization for users with sparse histories or extensive interaction histories, highlighting the growing value of collaborative knowledge in retrieval-augmented personalization.

Abstract

The increasing demand for personalized interactions with large language models (LLMs) calls for methodologies capable of accurately and efficiently identifying user opinions and preferences. Retrieval augmentation emerges as an effective strategy, as it can accommodate a vast number of users without the costs from fine-tuning. Existing research, however, has largely focused on enhancing the retrieval stage and devoted limited exploration toward optimizing the representation of the database, a crucial aspect for tasks such as personalization. In this work, we examine the problem from a novel angle, focusing on how data can be better represented for more data-efficient retrieval in the context of LLM customization. To tackle this challenge, we introduce Persona-DB, a simple yet effective framework consisting of a hierarchical construction process to improve generalization across task contexts and collaborative refinement to effectively bridge knowledge gaps among users. In the evaluation of response prediction, Persona-DB demonstrates superior context efficiency in maintaining accuracy with a significantly reduced retrieval size, a critical advantage in scenarios with extensive histories or limited context windows. Our experiments also indicate a marked improvement of over 10% under cold-start scenarios, when users have extremely sparse data. Furthermore, our analysis reveals the increasing importance of collaborative knowledge as the retrieval capacity expands.
Paper Structure (18 sections, 3 equations, 13 figures, 6 tables)

This paper contains 18 sections, 3 equations, 13 figures, 6 tables.

Figures (13)

  • Figure 1: The image outlines the Persona-DB workflow, which starts by distilling and inducing abstract personas from users' interaction histories. It then leverages the cache layer to facilitate the joining of relevant user databases, effectively borrowing knowledge to fill contextual gaps in the primary user's data. This enriched data pool is subsequently used by the retrieval for personalized inference.
  • Figure 2: During the retrieval-augmentation stage, the retriever selects data from the user's and the collaborative databases and composes them at a set ratio to inform the LLM. This strategy aims to enable the model to address challenges like sparse user interactions (e.g., cold-start) and domain irrelevance, offering an effective approach to LLM personalization in environments lacking user graphs.
  • Figure 3: Performance comparison on the OpinionQA task. The plot shows that Persona-DB outperforms the baselines consistently.
  • Figure 4: The figure illustrates the shift of correlation performance metric as capacity and proportion of collaborative content changes. The trends show that collaborative retrieval becomes more important as the retrieval size grows.
  • Figure 5: Case Studies.
  • ...and 8 more figures