Table of Contents
Fetching ...

Improving RAG for Personalization with Author Features and Contrastive Examples

Mert Yazan, Suzan Verberne, Frederik Situmeang

TL;DR

Personalization in retrieval-augmented generation (RAG) often misses fine-grained author traits. The paper proposes enriching the LLM prompt with author features and Contrastive Examples (CE) to emphasize what makes an author's style unique, achieving up to a 15% relative improvement over baselines. The approach uses LaMP datasets with Contriever as the retriever and shows that combining CE with author features yields strong gains, particularly for LaMP-7, while not adding computational overhead. This work introduces a new paradigm for RAG where contrastive context complements retrieved samples, enabling more precise, author-aware generation and opening avenues for further IR research on CE retrieval.

Abstract

Personalization with retrieval-augmented generation (RAG) often fails to capture fine-grained features of authors, making it hard to identify their unique traits. To enrich the RAG context, we propose providing Large Language Models (LLMs) with author-specific features, such as average sentiment polarity and frequently used words, in addition to past samples from the author's profile. We introduce a new feature called Contrastive Examples: documents from other authors are retrieved to help LLM identify what makes an author's style unique in comparison to others. Our experiments show that adding a couple of sentences about the named entities, dependency patterns, and words a person uses frequently significantly improves personalized text generation. Combining features with contrastive examples boosts the performance further, achieving a relative 15% improvement over baseline RAG while outperforming the benchmarks. Our results show the value of fine-grained features for better personalization, while opening a new research dimension for including contrastive examples as a complement with RAG. We release our code publicly.

Improving RAG for Personalization with Author Features and Contrastive Examples

TL;DR

Personalization in retrieval-augmented generation (RAG) often misses fine-grained author traits. The paper proposes enriching the LLM prompt with author features and Contrastive Examples (CE) to emphasize what makes an author's style unique, achieving up to a 15% relative improvement over baselines. The approach uses LaMP datasets with Contriever as the retriever and shows that combining CE with author features yields strong gains, particularly for LaMP-7, while not adding computational overhead. This work introduces a new paradigm for RAG where contrastive context complements retrieved samples, enabling more precise, author-aware generation and opening avenues for further IR research on CE retrieval.

Abstract

Personalization with retrieval-augmented generation (RAG) often fails to capture fine-grained features of authors, making it hard to identify their unique traits. To enrich the RAG context, we propose providing Large Language Models (LLMs) with author-specific features, such as average sentiment polarity and frequently used words, in addition to past samples from the author's profile. We introduce a new feature called Contrastive Examples: documents from other authors are retrieved to help LLM identify what makes an author's style unique in comparison to others. Our experiments show that adding a couple of sentences about the named entities, dependency patterns, and words a person uses frequently significantly improves personalized text generation. Combining features with contrastive examples boosts the performance further, achieving a relative 15% improvement over baseline RAG while outperforming the benchmarks. Our results show the value of fine-grained features for better personalization, while opening a new research dimension for including contrastive examples as a complement with RAG. We release our code publicly.

Paper Structure

This paper contains 10 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: An overview of our approach. The language model receives three inputs: (1) most similar samples from the author profile, given the author's input, (2) features derived from the author profile, (3) contrastive examples gathered from other users. Input (1) denotes the baseline RAG approach, while (2) and (3) are our additions to enrich RAG context.
  • Figure 2: The improvement each feature provides for Rouge-L on validation sets when included on top of baseline RAG. In this case, features are not combined but used individually. A negative change signifies the feature hurting the baseline. Values inside parentheses show the number of contrastive users.