Table of Contents
Fetching ...

Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

Payal Fofadiya, Sunil Tiwari

Abstract

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression framework that integrates importance-aware memory selection, coherence-sensitive filtering, and dynamic budget allocation to retain essential conversational information while controlling context growth. The approach is evaluated on LOCOMO, LOCCO, and LongBench benchmarks to assess answer quality, retrieval accuracy, coherence preservation, and efficiency. Experimental results demonstrate that the proposed method achieves consistent improvements in conversational stability and retrieval performance while reducing token usage and inference latency compared with existing memory and compression-based approaches. These findings indicate that adaptive context compression provides an effective balance between long-term memory preservation and computational efficiency in persistent LLM interactions

Developing Adaptive Context Compression Techniques for Large Language Models (LLMs) in Long-Running Interactions

Abstract

Large Language Models (LLMs) often experience performance degradation during long-running interactions due to increasing context length, memory saturation, and computational overhead. This paper presents an adaptive context compression framework that integrates importance-aware memory selection, coherence-sensitive filtering, and dynamic budget allocation to retain essential conversational information while controlling context growth. The approach is evaluated on LOCOMO, LOCCO, and LongBench benchmarks to assess answer quality, retrieval accuracy, coherence preservation, and efficiency. Experimental results demonstrate that the proposed method achieves consistent improvements in conversational stability and retrieval performance while reducing token usage and inference latency compared with existing memory and compression-based approaches. These findings indicate that adaptive context compression provides an effective balance between long-term memory preservation and computational efficiency in persistent LLM interactions

Paper Structure

This paper contains 18 sections, 9 equations, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Architecture of the proposed adaptive context compression framework for long-running LLM interactions.
  • Figure 2: Comparison of answer accuracy, retrieval accuracy, recall accuracy, and coherence scores across long-running conversational benchmarks. The proposed adaptive context compression method achieves slightly improved performance compared with existing memory-based approaches.
  • Figure 3: Efficiency comparison across long-context compression methods using reported efficiency values. The proposed adaptive context compression method achieves higher overall efficiency through adaptive token reduction and latency improvement.