Table of Contents
Fetching ...

AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees

Yangning Li, Shaoshen Chen, Yinghui Li, Yankai Chen, Hai-Tao Zheng, Hui Wang, Wenhao Jiang, Philip S. Yu

TL;DR

AdmTree introduces adaptive hierarchical compression for long-context processing by dynamically segmenting input based on information density, inserting gist tokens as leaves, and building a semantic binary tree with a lightweight aggregation module. By freezing the backbone LLM and training only lightweight components for gist-attention, embeddings, and aggregation, AdmTree achieves strong semantic fidelity with efficient inference. Comprehensive experiments on LongBench and dynamic dialogue settings demonstrate state-of-the-art performance across multiple tasks and robust scalability to varying context lengths and compression ratios, with ablations highlighting the value of the tree structure and adaptive gist allocation. The work also provides interpretability via tree-node attention patterns, suggesting future extensions such as mixture-of-experts for task-specific compression.

Abstract

The quadratic complexity of self-attention constrains Large Language Models (LLMs) in processing long contexts, a capability essential for many advanced applications. Context compression aims to alleviate this computational bottleneck while retaining critical semantic information. However, existing approaches often fall short: explicit methods may compromise local detail, whereas implicit methods can suffer from positional biases, information degradation, or an inability to capture long-range semantic dependencies. We propose AdmTree, a novel framework for adaptive, hierarchical context compression with a central focus on preserving high semantic fidelity while maintaining efficiency. AdmTree dynamically segments input based on information density, utilizing gist tokens to summarize variable-length segments as the leaves of a semantic binary tree. This structure, together with a lightweight aggregation mechanism and a frozen backbone LLM (thereby minimizing new trainable parameters), enables efficient hierarchical abstraction of the context. By preserving fine-grained details alongside global semantic coherence, mitigating positional bias, and dynamically adapting to content, AdmTree robustly retains the semantic information of long contexts.

AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees

TL;DR

AdmTree introduces adaptive hierarchical compression for long-context processing by dynamically segmenting input based on information density, inserting gist tokens as leaves, and building a semantic binary tree with a lightweight aggregation module. By freezing the backbone LLM and training only lightweight components for gist-attention, embeddings, and aggregation, AdmTree achieves strong semantic fidelity with efficient inference. Comprehensive experiments on LongBench and dynamic dialogue settings demonstrate state-of-the-art performance across multiple tasks and robust scalability to varying context lengths and compression ratios, with ablations highlighting the value of the tree structure and adaptive gist allocation. The work also provides interpretability via tree-node attention patterns, suggesting future extensions such as mixture-of-experts for task-specific compression.

Abstract

The quadratic complexity of self-attention constrains Large Language Models (LLMs) in processing long contexts, a capability essential for many advanced applications. Context compression aims to alleviate this computational bottleneck while retaining critical semantic information. However, existing approaches often fall short: explicit methods may compromise local detail, whereas implicit methods can suffer from positional biases, information degradation, or an inability to capture long-range semantic dependencies. We propose AdmTree, a novel framework for adaptive, hierarchical context compression with a central focus on preserving high semantic fidelity while maintaining efficiency. AdmTree dynamically segments input based on information density, utilizing gist tokens to summarize variable-length segments as the leaves of a semantic binary tree. This structure, together with a lightweight aggregation mechanism and a frozen backbone LLM (thereby minimizing new trainable parameters), enables efficient hierarchical abstraction of the context. By preserving fine-grained details alongside global semantic coherence, mitigating positional bias, and dynamically adapting to content, AdmTree robustly retains the semantic information of long contexts.

Paper Structure

This paper contains 50 sections, 14 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Pre-experiments demonstrate that existing methods struggle to balance semantic information of different dimensions for different types of tasks.
  • Figure 2: Comparison of compression frameworks. All methods generate responses conditioned on the representation in green. Unlike other methods compress in a linear manner, AdmTree dynamically balances context compression both horizontally and vertically through adaptive gist token allocation and tree hierarchy. Meanwhile, bidirectional aggregation mitigates information degradation.
  • Figure 3: Model performance under varying context lengths and compression ratios.
  • Figure 4: Comparison of Needle-in-The-Haystack kamradt2024needle results with LLaMA-2-7B as backbone LLM.
  • Figure 5: Retrieval accuracy comparison on the Topic Retrieval dataset under different context lengths and compression ratios.
  • ...and 2 more figures