Learning to Compress Prompt in Natural Language Formats

Yu-Neng Chuang; Tianwei Xing; Chia-Yuan Chang; Zirui Liu; Xun Chen; Xia Hu

Learning to Compress Prompt in Natural Language Formats

Yu-Neng Chuang, Tianwei Xing, Chia-Yuan Chang, Zirui Liu, Xun Chen, Xia Hu

TL;DR

The paper tackles the inefficiency of processing long prompts in LLMs by introducing Natural Language Prompt Encapsulation (Nano-Capsulator), which converts long NL prompts into NL-formatted CapsulePrompts. It jointly optimizes semantic preservation and a length-constrained reward to produce prompts that maintain downstream utility while enabling transferability across diverse LLMs. The approach achieves substantial gains, including up to $81.4\%$ prompt length reduction, $4.5\times$ faster inference, and $80.1\%$ API cost savings, with demonstrated transferability to Vicuna-13B, PaLM, Claude2, and unseen datasets on both few-shot CoT and reading comprehension tasks. These results indicate Nano-Capsulator’s potential for cost-efficient, scalable prompt deployment across API-based and open LLMs in real-world settings.

Abstract

Large language models (LLMs) are great at processing multiple natural language processing tasks, but their abilities are constrained by inferior performance with long context, slow inference speed, and the high cost of computing the results. Deploying LLMs with precise and informative context helps users process large-scale datasets more effectively and cost-efficiently. Existing works rely on compressing long prompt contexts into soft prompts. However, soft prompt compression encounters limitations in transferability across different LLMs, especially API-based LLMs. To this end, this work aims to compress lengthy prompts in the form of natural language with LLM transferability. This poses two challenges: (i) Natural Language (NL) prompts are incompatible with back-propagation, and (ii) NL prompts lack flexibility in imposing length constraints. In this work, we propose a Natural Language Prompt Encapsulation (Nano-Capsulator) framework compressing original prompts into NL formatted Capsule Prompt while maintaining the prompt utility and transferability. Specifically, to tackle the first challenge, the Nano-Capsulator is optimized by a reward function that interacts with the proposed semantics preserving loss. To address the second question, the Nano-Capsulator is optimized by a reward function featuring length constraints. Experimental results demonstrate that the Capsule Prompt can reduce 81.4% of the original length, decrease inference latency up to 4.5x, and save 80.1% of budget overheads while providing transferability across diverse LLMs and different datasets.

Learning to Compress Prompt in Natural Language Formats

TL;DR

prompt length reduction,

faster inference, and

API cost savings, with demonstrated transferability to Vicuna-13B, PaLM, Claude2, and unseen datasets on both few-shot CoT and reading comprehension tasks. These results indicate Nano-Capsulator’s potential for cost-efficient, scalable prompt deployment across API-based and open LLMs in real-world settings.

Abstract

Paper Structure (24 sections, 3 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 24 sections, 3 equations, 11 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Soft Prompt Compression
Context Distillation for Compression
Long Prompt Encapsulation
Prompt Encapsulation
NL-formatted Prompt Compression
Prompt Utility Preservation
Compression with Reward
Algorithm of Nano-Capsulator
Experiments
Dataset
Experiment Settings
Main Results (RQ1)
Contributions of Utility Preservation (RQ2)
...and 9 more sections

Figures (11)

Figure 1: An example of successful prompt compression with NL formats. The compressed NL-formatted prompt (green) aims to obtain a shorter length and maintain transferability and utility of the long prompt (red).
Figure 2: The illustration of Nano-Capsulator training framework. Nano-Capsulator compress the long prompt with the action of semantic (Equation \ref{['eq:sem']}) and utility preservation (Equation \ref{['eq:reward']}). Questions are sampled from the training set to develop the reward scores for utility preservation.
Figure 3: Evaluation of transferability on Nano-Capsulator across unseen datasets.
Figure 4: Comparison results of CapsulePrompt and Zero-shot Summarization on GSM8K dataset (left) and MultiRC dataset (right).
Figure 5: Ablation studies of comparison with CapsulePrompt and GPT-35-Turbo Summarization on CSQA dataset and GSM8K dataset (left); and of the contribution of Reward Function from Equation \ref{['eq:reward']} (right).
...and 6 more figures

Learning to Compress Prompt in Natural Language Formats

TL;DR

Abstract

Learning to Compress Prompt in Natural Language Formats

Authors

TL;DR

Abstract

Table of Contents

Figures (11)