Table of Contents
Fetching ...

RAM-Net: Expressive Linear Attention with Selectively Addressable Memory

Kaicheng Xiao, Haotian Li, Liran Dong, Guoliang Xing

TL;DR

RAM-Net tackles the expressivity-memory trade-off in sequence models by introducing a differentiable address decoder that maps dense inputs to high-dimensional sparse addresses, enabling selective access to a massive memory state without increasing parameters. The architecture combines an $U$-order Product Softmax address space with Top-$K$ sparsity (yielding $M=(d_p)^U$ slots), a Cyclic Address Positional Embedding to inject temporal structure, and PDMA memory updates that decouple forgetting from writing. Empirically, RAM-Net delivers superior long-range retrieval in synthetic MQAR benchmarks and competitive language modeling/zero-shot reasoning performance on large-scale data, while maintaining efficient computation via sparse reads/writes. These results suggest RAM-Net can achieve high-fidelity, fine-grained memory-based reasoning with significantly reduced inference overhead, offering a scalable path for memory-intensive sequence tasks.

Abstract

While linear attention architectures offer efficient inference, compressing unbounded history into a fixed-size memory inherently limits expressivity and causes information loss. To address this limitation, we introduce Random Access Memory Network (RAM-Net), a novel architecture designed to bridge the gap between the representational capacity of full attention and the memory efficiency of linear models. The core of RAM-Net maps inputs to high-dimensional sparse vectors serving as explicit addresses, allowing the model to selectively access a massive memory state. This design enables exponential state size scaling without additional parameters, which significantly mitigates signal interference and enhances retrieval fidelity. Moreover, the inherent sparsity ensures exceptional computational efficiency, as state updates are confined to minimal entries. Extensive experiments demonstrate that RAM-Net consistently surpasses state-of-the-art baselines in fine-grained long-range retrieval tasks and achieves competitive performance in standard language modeling and zero-shot commonsense reasoning benchmarks, validating its superior capability to capture complex dependencies with significantly reduced computational overhead.

RAM-Net: Expressive Linear Attention with Selectively Addressable Memory

TL;DR

RAM-Net tackles the expressivity-memory trade-off in sequence models by introducing a differentiable address decoder that maps dense inputs to high-dimensional sparse addresses, enabling selective access to a massive memory state without increasing parameters. The architecture combines an -order Product Softmax address space with Top- sparsity (yielding slots), a Cyclic Address Positional Embedding to inject temporal structure, and PDMA memory updates that decouple forgetting from writing. Empirically, RAM-Net delivers superior long-range retrieval in synthetic MQAR benchmarks and competitive language modeling/zero-shot reasoning performance on large-scale data, while maintaining efficient computation via sparse reads/writes. These results suggest RAM-Net can achieve high-fidelity, fine-grained memory-based reasoning with significantly reduced inference overhead, offering a scalable path for memory-intensive sequence tasks.

Abstract

While linear attention architectures offer efficient inference, compressing unbounded history into a fixed-size memory inherently limits expressivity and causes information loss. To address this limitation, we introduce Random Access Memory Network (RAM-Net), a novel architecture designed to bridge the gap between the representational capacity of full attention and the memory efficiency of linear models. The core of RAM-Net maps inputs to high-dimensional sparse vectors serving as explicit addresses, allowing the model to selectively access a massive memory state. This design enables exponential state size scaling without additional parameters, which significantly mitigates signal interference and enhances retrieval fidelity. Moreover, the inherent sparsity ensures exceptional computational efficiency, as state updates are confined to minimal entries. Extensive experiments demonstrate that RAM-Net consistently surpasses state-of-the-art baselines in fine-grained long-range retrieval tasks and achieves competitive performance in standard language modeling and zero-shot commonsense reasoning benchmarks, validating its superior capability to capture complex dependencies with significantly reduced computational overhead.
Paper Structure (22 sections, 7 equations, 5 figures, 3 tables)

This paper contains 22 sections, 7 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Comparison of memory mechanisms. (a) Full Attention: Retains the entire history for retrieval, resulting in linear memory growth. (b) Linear Attention: Compresses history into a fixed-size state. The reliance on kernel-based similarity leads to limited capacity and inevitable interference. (c) RAM-Net: Decouples memory capacity from feature dimension via an Address Decoder, which maps dense vectors $\mathbf{k}_t$ and $\mathbf{v}_t$ into high-dimensional sparse addresses $\mathbf{w}_t$ and $\mathbf{r}_t$. This enables massive state capacity and high-fidelity retrieval with constant memory state size.
  • Figure 2: Overview of the RAM-Net architecture. The Address Decoder transforms $\mathbf{k}_t$ and $\mathbf{q}_t$ vectors into high-dimensional sparse addresses via Product Softmax, Top-$K$ truncation, and Cyclic Address Positional Embedding (CAPE). For visual clarity, we illustrate a simplified configuration with $U=3$ partitions and sub-dimension $d_p=2$. This results in a total memory capacity of $M=8$ slots with selection sparsity $K=1$.
  • Figure 3: MQAR accuracy vs. total state size.
  • Figure 4: Ablation study of product softmax order $U$.
  • Figure 5: Visualization of memory access traces: read (green) and write (red) events across memory slots over time (tokens).