Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
Kazuki Irie, Morris Yau, Samuel J. Gershman
TL;DR
This work investigates integrating two memory paradigms—KV-memory with softmax attention (quadratic transformers) and FW-memory with linear attention (DeltaNet/linear transformers)—to create Hybrid Quadratic-Linear Transformers (HQLTs) for general sequence processing. It introduces three blending schemes (Delayed-Streaming, Delayed-Chunk, and Synchronous) and systematically evaluates them on large-scale language modeling, expressivity benchmarks, in-context retrieval, and reinforcement learning in POMDPs. Across tasks, the Synchronous HQLT consistently yields the strongest overall performance, leveraging simultaneous processing in KV- and FW-memory to combine precise recall with expressive computation. The results provide a principled view on designing neural memory systems, showing that a carefully synchronized hybrid can overcome the limitations of its individual components and shed light on memory design in future architectures.
Abstract
We develop hybrid memory architectures for general-purpose sequence processing neural networks, that combine key-value memory using softmax attention (KV-memory) with fast weight memory through dynamic synaptic modulation (FW-memory) -- the core principles of quadratic and linear transformers, respectively. These two memory systems have complementary but individually limited properties: KV-memory offers precise retrieval but is constrained by quadratic complexity in sequence length, while FW-memory supports arbitrarily long sequences and enables more expressive computation but sacrifices precise recall. We propose and compare three methods to blend these two systems into a single memory system, differing in how and when input information is delivered to each system, to leverage the strengths of both. We conduct experiments on general language modeling and retrieval tasks by training 340M- and 1.3B-parameter models from scratch, as well as on synthetic algorithmic tasks designed to precisely illustrate the benefits of certain hybrid methods over others. We also evaluate our hybrid memory systems on reinforcement learning in partially observable environments. Overall, we demonstrate how a well-designed hybrid can overcome the limitations of its individual components, offering new insights into the design principle of neural memory systems.
