R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression
Xiaoqiang Wang, Suyuchen Wang, Yun Zhu, Bang Liu
TL;DR
The paper tackles the memory bottleneck in large language models by proposing R3Mem, a memory network that fuses retention and retrieval in a reversible architecture to handle indefinitely long histories. It uses hierarchical compression with virtual memory tokens to encode long contexts and an adapter-based reversible Transformer to reconstruct raw content, trained with bidirectional, cycle-consistent losses. Empirically, it achieves state-of-the-art perplexity on long-context benchmarks and strong retrieval-augmented generation, including integration into a real-world conversational agent. The approach reduces external storage needs and enables scalable lifelong memory, though it raises questions about out-of-domain generalization and stability of memory updates.
Abstract
Memory plays a key role in enhancing LLMs' performance when deployed to real-world applications. Existing solutions face trade-offs: explicit memory designs based on external storage require complex management and incur storage overhead, while implicit memory designs that store information via parameters struggle with reliable retrieval. In this paper, we propose R$^3$Mem, a memory network that optimizes both information Retention and Retrieval through Reversible context compression. Specifically, R$^3$Mem employs virtual memory tokens to compress and encode infinitely long histories, further enhanced by a hierarchical compression strategy that refines information from document- to entity-level for improved assimilation across granularities. For retrieval, R$^3$Mem employs a reversible architecture, reconstructing raw data by invoking the model backward with compressed information. Implemented via parameter-efficient fine-tuning, it can integrate seamlessly with any Transformer-based model. Experiments demonstrate that our memory design achieves state-of-the-art performance in long-context language modeling and retrieval-augmented generation tasks. It also significantly outperforms conventional memory modules in long-horizon interaction tasks like conversational agents, showcasing its potential for next-generation retrieval systems.
