AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees
Yangning Li, Shaoshen Chen, Yinghui Li, Yankai Chen, Hai-Tao Zheng, Hui Wang, Wenhao Jiang, Philip S. Yu
TL;DR
AdmTree introduces adaptive hierarchical compression for long-context processing by dynamically segmenting input based on information density, inserting gist tokens as leaves, and building a semantic binary tree with a lightweight aggregation module. By freezing the backbone LLM and training only lightweight components for gist-attention, embeddings, and aggregation, AdmTree achieves strong semantic fidelity with efficient inference. Comprehensive experiments on LongBench and dynamic dialogue settings demonstrate state-of-the-art performance across multiple tasks and robust scalability to varying context lengths and compression ratios, with ablations highlighting the value of the tree structure and adaptive gist allocation. The work also provides interpretability via tree-node attention patterns, suggesting future extensions such as mixture-of-experts for task-specific compression.
Abstract
The quadratic complexity of self-attention constrains Large Language Models (LLMs) in processing long contexts, a capability essential for many advanced applications. Context compression aims to alleviate this computational bottleneck while retaining critical semantic information. However, existing approaches often fall short: explicit methods may compromise local detail, whereas implicit methods can suffer from positional biases, information degradation, or an inability to capture long-range semantic dependencies. We propose AdmTree, a novel framework for adaptive, hierarchical context compression with a central focus on preserving high semantic fidelity while maintaining efficiency. AdmTree dynamically segments input based on information density, utilizing gist tokens to summarize variable-length segments as the leaves of a semantic binary tree. This structure, together with a lightweight aggregation mechanism and a frozen backbone LLM (thereby minimizing new trainable parameters), enables efficient hierarchical abstraction of the context. By preserving fine-grained details alongside global semantic coherence, mitigating positional bias, and dynamically adapting to content, AdmTree robustly retains the semantic information of long contexts.
