Table of Contents
Fetching ...

On the Way to LLM Personalization: Learning to Remember User Conversations

Lucie Charlotte Magister, Katherine Metcalf, Yizhe Zhang, Maartje ter Hoeve

TL;DR

PLUM is proposed, a pipeline performing data augmentation for up-sampling conversations as question-answer pairs that are then used to finetune a low-rank adaptation adapter with a weighted cross entropy loss.

Abstract

Large Language Models (LLMs) have quickly become an invaluable assistant for a variety of tasks. However, their effectiveness is constrained by their ability to tailor responses to human preferences and behaviors via personalization. Prior work in LLM personalization has largely focused on style transfer or incorporating small factoids about the user, as knowledge injection remains an open challenge. In this paper, we explore injecting knowledge of prior conversations into LLMs to enable future work on less redundant, personalized conversations. We identify two real-world constraints: (1) conversations are sequential in time and must be treated as such during training, and (2) per-user personalization is only viable in parameter-efficient settings. To this aim, we propose PLUM, a pipeline performing data augmentation for up-sampling conversations as question-answer pairs, that are then used to finetune a low-rank adaptation adapter with a weighted cross entropy loss. Even in this first exploration of the problem, we perform competitively with baselines such as RAG, attaining an accuracy of 81.5% across 100 conversations.

On the Way to LLM Personalization: Learning to Remember User Conversations

TL;DR

PLUM is proposed, a pipeline performing data augmentation for up-sampling conversations as question-answer pairs that are then used to finetune a low-rank adaptation adapter with a weighted cross entropy loss.

Abstract

Large Language Models (LLMs) have quickly become an invaluable assistant for a variety of tasks. However, their effectiveness is constrained by their ability to tailor responses to human preferences and behaviors via personalization. Prior work in LLM personalization has largely focused on style transfer or incorporating small factoids about the user, as knowledge injection remains an open challenge. In this paper, we explore injecting knowledge of prior conversations into LLMs to enable future work on less redundant, personalized conversations. We identify two real-world constraints: (1) conversations are sequential in time and must be treated as such during training, and (2) per-user personalization is only viable in parameter-efficient settings. To this aim, we propose PLUM, a pipeline performing data augmentation for up-sampling conversations as question-answer pairs, that are then used to finetune a low-rank adaptation adapter with a weighted cross entropy loss. Even in this first exploration of the problem, we perform competitively with baselines such as RAG, attaining an accuracy of 81.5% across 100 conversations.

Paper Structure

This paper contains 44 sections, 2 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: On overview of PLUM, a two-stage pipeline for injecting knowledge of prior user conversations into the LLM. The first step of the pipeline focuses on augmenting user conversations as positive and negative question-answer pairs about the conversation. These are then used in the finetuning step, where the LLM is trained on samples of a single conversation at a time for 10 epochs with a weighted cross entropy loss.
  • Figure 2: Accuracy over time for PLUM with the system prompt.
  • Figure 3: Consistency plots visualizing whether the 'yes'/'no' question was predicted correctly (blue) or incorrectly (orange) for a given time step. Here, a time step refers to the model having seen all samples of a conversation for the specified number of epochs. The lower left triangle of the plot is gray, as these conversations have not been seen yet.