Table of Contents
Fetching ...

InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation

Bowen Cao, Deng Cai, Wai Lam

TL;DR

InfiniteICL is introduced, a framework that parallels context and parameters in LLMs with short- and long-term memory in human cognitive systems, focusing on transforming temporary context knowledge into permanent parameter updates, and significantly reduces memory usage, maintains robust performance across varying input lengths, and theoretically enables infinite context integration.

Abstract

In-context learning (ICL) is critical for large language models (LLMs), but its effectiveness is constrained by finite context windows, particularly in ultra-long contexts. To overcome this, we introduce InfiniteICL, a framework that parallels context and parameters in LLMs with short- and long-term memory in human cognitive systems, focusing on transforming temporary context knowledge into permanent parameter updates. This approach significantly reduces memory usage, maintains robust performance across varying input lengths, and theoretically enables infinite context integration through the principles of context knowledge elicitation, selection, and consolidation. Evaluations demonstrate that our method reduces context length by 90% while achieving 103% average performance of full-context prompting across fact recall, grounded reasoning, and skill acquisition tasks. When conducting sequential multi-turn transformations on complex, real-world contexts (with length up to 2M tokens), our approach surpasses full-context prompting while using only 0.4% of the original contexts. These findings highlight InfiniteICL's potential to enhance the scalability and efficiency of LLMs by breaking the limitations of conventional context window sizes.

InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation

TL;DR

InfiniteICL is introduced, a framework that parallels context and parameters in LLMs with short- and long-term memory in human cognitive systems, focusing on transforming temporary context knowledge into permanent parameter updates, and significantly reduces memory usage, maintains robust performance across varying input lengths, and theoretically enables infinite context integration.

Abstract

In-context learning (ICL) is critical for large language models (LLMs), but its effectiveness is constrained by finite context windows, particularly in ultra-long contexts. To overcome this, we introduce InfiniteICL, a framework that parallels context and parameters in LLMs with short- and long-term memory in human cognitive systems, focusing on transforming temporary context knowledge into permanent parameter updates. This approach significantly reduces memory usage, maintains robust performance across varying input lengths, and theoretically enables infinite context integration through the principles of context knowledge elicitation, selection, and consolidation. Evaluations demonstrate that our method reduces context length by 90% while achieving 103% average performance of full-context prompting across fact recall, grounded reasoning, and skill acquisition tasks. When conducting sequential multi-turn transformations on complex, real-world contexts (with length up to 2M tokens), our approach surpasses full-context prompting while using only 0.4% of the original contexts. These findings highlight InfiniteICL's potential to enhance the scalability and efficiency of LLMs by breaking the limitations of conventional context window sizes.

Paper Structure

This paper contains 45 sections, 5 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: The core idea of our framework. The context window is refreshed after each transformation ($T_i: \theta_{i-1}+C_i\rightarrow \theta_i$), allowing infinite context input in a streaming fashion.
  • Figure 2: Average model performance on single transformation tasks across different retention ratios. To maintain the readability of the figure, we only show the results of some representative baselines.
  • Figure 3: Performance comparison across different context lengths in sequential transformation tasks. To maintain the readability of the figure, we only show the results of some representative baselines.