LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
Sumin An, Junyoung Sung, Wonpyo Park, Chanjun Park, Paul Hongsuck Seo
TL;DR
LCIRC extends the context window of LLMs by recurrently compressing long-form input into compact representations and reinjecting them via gated cross-attention, enabling efficient processing of sequences far beyond native limits. The framework is augmented with Query Dependent LCIRC (QD-LCIRC), which conditionally preserves query-relevant information through an additional gating mechanism, improving performance on long-context tasks. Empirical results show substantial perplexity improvements and strong gains on ultra-long benchmarks (e.g., InfiniteBench, LongBench) compared to baselines like ExtendedFA and AutoCompressor, with QD-LCIRC delivering the best average performance. The contributions offer a scalable approach to long-context reasoning with practical implications for document understanding, long-form QA, and other tasks requiring extensive context and precise query relevance.
Abstract
While large language models (LLMs) excel in generating coherent and contextually rich outputs, their capacity to efficiently handle long-form contexts is limited by fixed-length position embeddings. Additionally, the computational cost of processing long sequences increases quadratically, making it challenging to extend context length. To address these challenges, we propose Long-form Context Injection with Recurrent Compression (LCIRC), a method that enables the efficient processing long-form sequences beyond the model's length limit through recurrent compression without retraining the entire model. We further introduce query dependent context modeling, which selectively compresses query-relevant information, ensuring that the model retains the most pertinent content. Our empirical results demonstrate that Query Dependent LCIRC (QD-LCIRC) significantly improves LLM's ability to manage extended contexts, making it well-suited for tasks that require both comprehensive context understanding and query relevance.
