MemLong: Memory-Augmented Retrieval for Long Text Modeling
Weijie Liu, Zecheng Tang, Juntao Li, Kehai Chen, Min Zhang
TL;DR
MemLong tackles the challenge of long-context language modeling by coupling a fixed, pre-trained decoder with an external retriever and a memory bank that stores chunk-level K-V pairs and representations. A retrieval causal attention mechanism fuses retrieved memory with local context in the upper layers while keeping the lower layers frozen, enabling context extension up to 80k tokens on a single GPU. The approach yields consistent perplexity improvements across long-context benchmarks and enhances in-context learning using in-memory demonstrations, all with favorable efficiency due to selective memory updates. This enables practical long-range document processing and scalable retrieval-augmented generation without wholesale model retraining. Overall, MemLong demonstrates strong performance gains and a feasible path to significantly longer context windows in decoder-only LLMs.
Abstract
Recent advancements in Large Language Models (LLMs) have yielded remarkable success across diverse fields. However, handling long contexts remains a significant challenge for LLMs due to the quadratic time and space complexity of attention mechanisms and the growing memory consumption of the key-value cache during generation. This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation, a method designed to enhance the capabilities of long-context language modeling by utilizing an external retriever for historical information retrieval. MemLong combines a non-differentiable ``ret-mem'' module with a partially trainable decoder-only language model and introduces a fine-grained, controllable retrieval attention mechanism that leverages semantic-level relevant chunks. Comprehensive evaluations on multiple long-context language modeling benchmarks demonstrate that MemLong consistently outperforms other state-of-the-art LLMs. More importantly, MemLong can extend the context length on a single 3090 GPU from 4k up to 80k. Our code is available at https://github.com/Bui1dMySea/MemLong
