Table of Contents
Fetching ...

ComMer: a Framework for Compressing and Merging User Data for Personalization

Yoel Zeldes, Amir Zait, Ilia Labzovsky, Danny Karmon, Efrat Farkash

TL;DR

ComMer tackles the challenge of personalizing large language models under data and computation constraints by compressing per-user documents into trainable latent prompts and merging them with mean pooling before prompting a frozen LLM. Training updates only the compression embeddings and LoRA adapters, avoiding per-user fine-tuning of the backbone, enabling scalable personalization. Results show strong performance for personalized skill learning under tight token budgets, but reveal limitations in knowledge-intensive tasks where detailed information is lost in compression. The work highlights trade-offs between document quantity, merging strategy, and pretraining, offering guidance for efficient multi-document personalization and suggesting avenues for improvement in future research.

Abstract

Large Language Models (LLMs) excel at a wide range of tasks, but adapting them to new data, particularly for personalized applications, poses significant challenges due to resource and computational constraints. Existing methods either rely on exposing fresh data to the model through the prompt, which is limited by context size and computationally expensive at inference time, or fine-tuning, which incurs substantial training and update costs. In this paper, we introduce ComMer - Compress and Merge - a novel framework that efficiently personalizes LLMs by compressing users' documents into compact representations, which are then merged and fed into a frozen LLM. We evaluate ComMer on two types of personalization tasks - personalized skill learning, using the tweet paraphrasing dataset and the personalized news headline generation dataset from the LaMP benchmark, and knowledge-intensive, using the PerLTQA dataset. Our experiments demonstrate that in constrained inference budget scenarios ComMer achieves superior quality in skill learning tasks, while highlighting limitations in knowledge-intensive settings due to the loss of detailed information. These results offer insights into trade-offs and potential optimizations in multi-document compression for personalization.

ComMer: a Framework for Compressing and Merging User Data for Personalization

TL;DR

ComMer tackles the challenge of personalizing large language models under data and computation constraints by compressing per-user documents into trainable latent prompts and merging them with mean pooling before prompting a frozen LLM. Training updates only the compression embeddings and LoRA adapters, avoiding per-user fine-tuning of the backbone, enabling scalable personalization. Results show strong performance for personalized skill learning under tight token budgets, but reveal limitations in knowledge-intensive tasks where detailed information is lost in compression. The work highlights trade-offs between document quantity, merging strategy, and pretraining, offering guidance for efficient multi-document personalization and suggesting avenues for improvement in future research.

Abstract

Large Language Models (LLMs) excel at a wide range of tasks, but adapting them to new data, particularly for personalized applications, poses significant challenges due to resource and computational constraints. Existing methods either rely on exposing fresh data to the model through the prompt, which is limited by context size and computationally expensive at inference time, or fine-tuning, which incurs substantial training and update costs. In this paper, we introduce ComMer - Compress and Merge - a novel framework that efficiently personalizes LLMs by compressing users' documents into compact representations, which are then merged and fed into a frozen LLM. We evaluate ComMer on two types of personalization tasks - personalized skill learning, using the tweet paraphrasing dataset and the personalized news headline generation dataset from the LaMP benchmark, and knowledge-intensive, using the PerLTQA dataset. Our experiments demonstrate that in constrained inference budget scenarios ComMer achieves superior quality in skill learning tasks, while highlighting limitations in knowledge-intensive settings due to the loss of detailed information. These results offer insights into trade-offs and potential optimizations in multi-document compression for personalization.
Paper Structure (26 sections, 7 figures, 5 tables)

This paper contains 26 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Approaches for adapting LLMs to new data include integrating it through the prompt or modifying the model by updating its existing weights or introducing new trainable weights. Both methods have advantages and drawbacks, while our proposed method, ComMer, combines the benefits of both approaches.
  • Figure 2: Left: ComMer architecture. Each document is independently compressed into a fixed-size representation by a trainable compressor. These compressions are then merged using mean pool. Finally, the aggregated compression is plugged into a frozen LLM. Right: The compressor architecture. The input is appended with trainable compression embeddings, and processed by a frozen LLM, which is adapted using a trainable LoRA. The compressor's output consists of the final layer’s representations of the compression embeddings.
  • Figure 3: The trade-off between cost (number of tokens in the prompt) and quality (perplexity on the left, and ROUGE-L on the right), demonstrated using two personlized skill learning tasks: personalized tweet paraphrasing (top) and personalized news headline generation (bottom). Each curve represents models trained with different numbers of embeddings: 4, 8, 16, 32, 64, and 128, ordered from left to right. In the small token budget regime, ComMer achieves higher quality results with fewer resources than prompt-tuning, highlighting its ability to efficiently extract personalization signals from multiple documents.
  • Figure 4: Perplexity of ComMer as a function of the number of documents follows a power-law relation in both the personalized tweet paraphrasing (left) and the personalized news headline generation (right) tasks. This pattern holds across all numbers of compression embeddings used by ComMer. It suggests that increasing the number of documents will further enhance ComMer's quality.
  • Figure 5: The trade-off between cost (number of tokens in the prompt) and quality (perplexity on the left, and ROUGE-L on the right), demonstrated using PerLTQA. Each curve represents models trained with different numbers of embeddings: 4, 8, 16, 32, 64, and 128, ordered from left to right. Compressing multiple documents degrades quality, indicating that ComMer may not be well-suited for the knowledge-intensive nature of PerLTQA.
  • ...and 2 more figures