Table of Contents
Fetching ...

Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio

Yijiong Yu, Shuai Yuan, Jie Zheng, Huazheng Wang, Ji Pei

Abstract

Soft context compression reduces the computational workload of processing long contexts in LLMs by encoding long context into a smaller number of latent tokens. However, existing frameworks apply uniform compression ratios, failing to account for the extreme variance in natural language information density. While adopting a density-aware dynamic compression ratio seems intuitive, empirical investigations reveal that models struggle intrinsically with operations parameterized by input dependent, continuous structural hyperparameters. To resolve this pitfall, we introduce Semi-Dynamic Context Compression framework. Our approach features a Discrete Ratio Selector, which predicts a compression target based on intrinsic information density and quantizes it to a predefined set of discrete compression ratios. It is efficiently jointly trained with the compressor on synthetic data, with the summary lengths as a proxy to create labels for compression ratio prediction. Extensive evaluations confirm that our density-aware framework, utilizing mean pooling as the backbone, consistently outperforms static baselines, establishing a robust Pareto frontier for context compression techniques. Our code, data and model weights are available at https://github.com/yuyijiong/semi-dynamic-context-compress

Density-aware Soft Context Compression with Semi-Dynamic Compression Ratio

Abstract

Soft context compression reduces the computational workload of processing long contexts in LLMs by encoding long context into a smaller number of latent tokens. However, existing frameworks apply uniform compression ratios, failing to account for the extreme variance in natural language information density. While adopting a density-aware dynamic compression ratio seems intuitive, empirical investigations reveal that models struggle intrinsically with operations parameterized by input dependent, continuous structural hyperparameters. To resolve this pitfall, we introduce Semi-Dynamic Context Compression framework. Our approach features a Discrete Ratio Selector, which predicts a compression target based on intrinsic information density and quantizes it to a predefined set of discrete compression ratios. It is efficiently jointly trained with the compressor on synthetic data, with the summary lengths as a proxy to create labels for compression ratio prediction. Extensive evaluations confirm that our density-aware framework, utilizing mean pooling as the backbone, consistently outperforms static baselines, establishing a robust Pareto frontier for context compression techniques. Our code, data and model weights are available at https://github.com/yuyijiong/semi-dynamic-context-compress

Paper Structure

This paper contains 31 sections, 7 equations, 8 figures.

Figures (8)

  • Figure 1: Three typical feature extraction mechanisms for soft context compression.
  • Figure 2: Our semi-dynamic context compression method, utilizing mean-pooling as the optimal structural backbone.
  • Figure 3: Accuracy vs. Average Compression Ratio across three feature extraction methods (mean-pooling, last tokens, compression tokens), evaluated under fixed-ratio and fixed-length settings.
  • Figure 4: Accuracy vs. Average Compression Ratio for fixed-ratio vs. semi-dynamic mean-pooling. For the semi-dynamic, the $scale$ parameter is varied across $\{-2, -1.5, -1, -0.5, 0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4\}$ to achieve gradually growing compression ratio. The baseline dashed line represents not using compression.
  • Figure 5: Variance of selected compression ratios ($\log_2$) and absolute accuracy improvement of semi-dynamic method over the fixed-ratio baseline.
  • ...and 3 more figures