Table of Contents
Fetching ...

Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios

Jiwei Tang, Jin Xu, Tingwei Lu, Zhicheng Zhang, Yiming Zhao, Lin Hai, Hai-Tao Zheng

TL;DR

The paper tackles the challenges of using LLMs in long-context scenarios, where prompts contain substantial redundancy and key information may be positioned unfavorably. It introduces Perception Compressor, a training-free framework comprising a perception retriever, a dual-slope ratio allocator, and semi-guided iterative compression, which together select, reorder, and compress demonstrations while preserving key tokens. The method achieves state-of-the-art performance on NaturalQuestions, LongBench, and MuSiQue under varying compression budgets, and ablation studies confirm the necessity of each component. This approach offers a practical, scalable solution for efficient long-context prompting with meaningful gains in robustness and accuracy.

Abstract

Large language models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information in long context scenarios. To address these challenges, we present Perception Compressor, a training-free prompt compression framework. It includes a perception retriever that leverages guiding questions and instruction to retrieve the most relevant demonstrations, a dual-slope ratio allocator to dynamically allocate compression ratios and open-book ratios, and a semi-guided iterative compression that retains key information at the token level while removing tokens that distract the LLM. We conduct extensive experiments on long context benchmarks, i.e., NaturalQuestions, LongBench, and MuSiQue. Experiment results show that Perception Compressor outperforms existing methods by a large margin, achieving state-of-the-art performance.

Perception Compressor: A Training-Free Prompt Compression Framework in Long Context Scenarios

TL;DR

The paper tackles the challenges of using LLMs in long-context scenarios, where prompts contain substantial redundancy and key information may be positioned unfavorably. It introduces Perception Compressor, a training-free framework comprising a perception retriever, a dual-slope ratio allocator, and semi-guided iterative compression, which together select, reorder, and compress demonstrations while preserving key tokens. The method achieves state-of-the-art performance on NaturalQuestions, LongBench, and MuSiQue under varying compression budgets, and ablation studies confirm the necessity of each component. This approach offers a practical, scalable solution for efficient long-context prompting with meaningful gains in robustness and accuracy.

Abstract

Large language models (LLMs) demonstrate exceptional capabilities in various scenarios. However, they suffer from much redundant information and are sensitive to the position of key information in long context scenarios. To address these challenges, we present Perception Compressor, a training-free prompt compression framework. It includes a perception retriever that leverages guiding questions and instruction to retrieve the most relevant demonstrations, a dual-slope ratio allocator to dynamically allocate compression ratios and open-book ratios, and a semi-guided iterative compression that retains key information at the token level while removing tokens that distract the LLM. We conduct extensive experiments on long context benchmarks, i.e., NaturalQuestions, LongBench, and MuSiQue. Experiment results show that Perception Compressor outperforms existing methods by a large margin, achieving state-of-the-art performance.
Paper Structure (34 sections, 25 equations, 6 figures, 9 tables)

This paper contains 34 sections, 25 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Perplexity v.s. Contrast Perplexity. Only a very small number of tokens related to key information have a high contrast perplexity, while the contrast perplexity of other tokens is nearly the same. However, the perplexity of different tokens varies significantly.
  • Figure 2: Framework of Perception Compressor. The original prompt can be divided into instruction, demonstrations, and question. Perception Compressor first uses the perception retriever to retrieve the most relevant demonstrations and reorders them from most to least relevant to the input question. Then, it performs a semi-guided iterative compression to obtain the final compressed prompt. The entire process is controlled by the compression ratios and open-book ratios allocated by the dual-slope ratio allocator.
  • Figure 3: Parameter Sensitivity Analysis on the 20th position of NatureQuestions under 2x constraint. We use LongChat-7B-v1.5-32k as the response model.
  • Figure 4: Generate guiding questions for the input question "where did the titanic sink at what ocean?".
  • Figure 5: Case study on NaturalQuestions (20 documents) liu2024lost under 4x constraint.
  • ...and 1 more figures