Task-agnostic Prompt Compression with Context-aware Sentence Embedding and Reward-guided Task Descriptor
Barys Liskavets, Shuvendu Roy, Maxim Ushakov, Mark Klibanov, Ali Etemad, Shane Luke
TL;DR
This work addresses the problem of prompt compression for large language models by introducing Task-agnostic Prompt Compression (TPC), a general framework that does not rely on input questions or handcrafted templates. It combines a Context-relevant Task Descriptor (CTD) to generate a task description from the prompt, a context-aware sentence encoder (CSE) to assess sentence relevance, and reinforcement learning to fine-tune the descriptor for informative compression. The authors curate two datasets (CTD and MCQR) to train the descriptor and encoder, and demonstrate that three model sizes (Base, Large, Huge) outperform existing state-of-the-art methods on LongBench and ZeroSCROLLS in both prompt-aware and prompt-agnostic settings, with the smallest model achieving competitive results. The approach offers improved generalization across tasks and domains, and the authors provide plans to release datasets and code to enable reproducibility and further development.
Abstract
The rise of Large Language Models (LLMs) has led to significant interest in prompt compression, a technique aimed at reducing the length of input prompts while preserving critical information. However, the prominent approaches in prompt compression often require explicit questions or handcrafted templates for compression, limiting their generalizability. We propose Task-agnostic Prompt Compression (TPC), a novel framework that generalizes compression across tasks and domains without requiring input questions or templates. TPC generates a context-relevant task description using a task descriptor trained on a curated dataset of context and query pairs, and fine-tuned via reinforcement learning with a reward function designed to capture the most relevant information. The task descriptor is then utilized to compute the relevance of each sentence in the prompt to generate the compressed prompt. We introduce 3 model sizes (Base, Large, and Huge), where the largest model outperforms the existing state-of-the-art methods on LongBench and ZeroSCROLLS benchmarks, and our smallest model performs comparable to the existing solutions while being considerably smaller.
