Human-inspired Episodic Memory for Infinite Context LLMs
Zafeirios Fountas, Martin A Benfeghoul, Adnan Oomerjee, Fenia Christopoulou, Gerasimos Lampouras, Haitham Bou-Ammar, Jun Wang
TL;DR
This work addresses the difficulty of maintaining coherence over extremely long contexts in LLMs by introducing EM-LLM, a memory-augmented framework inspired by human episodic memory and event cognition. It forms memories via surprise-driven event boundaries, refines these boundaries with graph-theoretic metrics, and retrieves memories through a two-stage process that combines similarity and temporal contiguity, enabling layer-wise access without fine-tuning. Empirically, EM-LLM achieves state-of-the-art long-context performance on LongBench and ∞-Bench, surpasses RAG and full-context baselines on most tasks, and can retrieve from contexts up to $10^7$ tokens, demonstrating practically infinite context handling. The work also shows correlations between EM-LLM’s event segmentation and human-perceived events, suggesting cognitive parallels and offering a computational framework for studying human memory mechanisms alongside practical gains in AI systems.
Abstract
Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences. In contrast, the human brain excels at organising and retrieving episodic experiences across vast temporal scales, spanning a lifetime. In this work, we introduce EM-LLM, a novel approach that integrates key aspects of human episodic memory and event cognition into LLMs with no fine-tuning, enabling them to handle practically infinite context lengths while maintaining computational efficiency. EM-LLM organises sequences of tokens into coherent episodic events using a combination of Bayesian surprise and graph-theoretic boundary refinement in an online fashion. When needed, these events are retrieved through a two-stage memory process, combining similarity-based and temporally contiguous retrieval for efficient, human-inspired access to relevant information. Experiments on the LongBench and $\infty$-Bench benchmarks demonstrate EM-LLM's superior performance, consistently outperforming the state-of-the-art retrieval model InfLLM across various baseline LLMs. In addition, EM-LLM outperforms its popular counterpart, RAG, in a wide range of tasks, while requiring similar resources. Notably, EM-LLM's performance even surpasses full-context models in most tasks, while successfully performing retrieval across 10 million tokens -- a scale computationally infeasible for such models. Finally, our analysis reveals strong correlations between EM-LLM's event segmentation and human-perceived events, suggesting parallels between this artificial system and its biological counterpart, thereby offering a novel computational framework for exploring human memory mechanisms.
