Table of Contents
Fetching ...

R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression

Xiaoqiang Wang, Suyuchen Wang, Yun Zhu, Bang Liu

TL;DR

The paper tackles the memory bottleneck in large language models by proposing R3Mem, a memory network that fuses retention and retrieval in a reversible architecture to handle indefinitely long histories. It uses hierarchical compression with virtual memory tokens to encode long contexts and an adapter-based reversible Transformer to reconstruct raw content, trained with bidirectional, cycle-consistent losses. Empirically, it achieves state-of-the-art perplexity on long-context benchmarks and strong retrieval-augmented generation, including integration into a real-world conversational agent. The approach reduces external storage needs and enables scalable lifelong memory, though it raises questions about out-of-domain generalization and stability of memory updates.

Abstract

Memory plays a key role in enhancing LLMs' performance when deployed to real-world applications. Existing solutions face trade-offs: explicit memory designs based on external storage require complex management and incur storage overhead, while implicit memory designs that store information via parameters struggle with reliable retrieval. In this paper, we propose R$^3$Mem, a memory network that optimizes both information Retention and Retrieval through Reversible context compression. Specifically, R$^3$Mem employs virtual memory tokens to compress and encode infinitely long histories, further enhanced by a hierarchical compression strategy that refines information from document- to entity-level for improved assimilation across granularities. For retrieval, R$^3$Mem employs a reversible architecture, reconstructing raw data by invoking the model backward with compressed information. Implemented via parameter-efficient fine-tuning, it can integrate seamlessly with any Transformer-based model. Experiments demonstrate that our memory design achieves state-of-the-art performance in long-context language modeling and retrieval-augmented generation tasks. It also significantly outperforms conventional memory modules in long-horizon interaction tasks like conversational agents, showcasing its potential for next-generation retrieval systems.

R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression

TL;DR

The paper tackles the memory bottleneck in large language models by proposing R3Mem, a memory network that fuses retention and retrieval in a reversible architecture to handle indefinitely long histories. It uses hierarchical compression with virtual memory tokens to encode long contexts and an adapter-based reversible Transformer to reconstruct raw content, trained with bidirectional, cycle-consistent losses. Empirically, it achieves state-of-the-art perplexity on long-context benchmarks and strong retrieval-augmented generation, including integration into a real-world conversational agent. The approach reduces external storage needs and enables scalable lifelong memory, though it raises questions about out-of-domain generalization and stability of memory updates.

Abstract

Memory plays a key role in enhancing LLMs' performance when deployed to real-world applications. Existing solutions face trade-offs: explicit memory designs based on external storage require complex management and incur storage overhead, while implicit memory designs that store information via parameters struggle with reliable retrieval. In this paper, we propose RMem, a memory network that optimizes both information Retention and Retrieval through Reversible context compression. Specifically, RMem employs virtual memory tokens to compress and encode infinitely long histories, further enhanced by a hierarchical compression strategy that refines information from document- to entity-level for improved assimilation across granularities. For retrieval, RMem employs a reversible architecture, reconstructing raw data by invoking the model backward with compressed information. Implemented via parameter-efficient fine-tuning, it can integrate seamlessly with any Transformer-based model. Experiments demonstrate that our memory design achieves state-of-the-art performance in long-context language modeling and retrieval-augmented generation tasks. It also significantly outperforms conventional memory modules in long-horizon interaction tasks like conversational agents, showcasing its potential for next-generation retrieval systems.

Paper Structure

This paper contains 15 sections, 10 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Comparison between explicit memory, implicit memory, and our proposed R$^3$Mem memory design.
  • Figure 2: Overview of R$^3$Mem's architecture: The model employs a reversible framework that integrates context compression and expansion mechanisms. For the forward model, raw textual data is hierarchically encoded into compact representations at various levels—document, paragraph, and entity—using virtual memory tokens. In the backward model, the model reconstructs the original information by reversing the compression process.
  • Figure 3: The architecture of the reversible Transformer. Left: The general reversible neural architecture. Right: The components of the reversible Transformer.
  • Figure 4: RAG performance on the UltraDomain dataset in terms of in-domain and out-of-domain settings.
  • Figure 5: Evaluation of memory retrieval and response generation when integrating R$^3$Mem into the SiliconFriend conversational agent. The overall score represents the average across all four evaluation metrics. Scores are re-scaled using min-max normalization for each metric to enhance clarity.
  • ...and 3 more figures