Key-value memory in the brain

Samuel J. Gershman; Ila Fiete; Kazuki Irie

Key-value memory in the brain

Samuel J. Gershman, Ila Fiete, Kazuki Irie

TL;DR

This paper reframes memory as a key-value memory system, separating storage content (values) from memory addressing (keys) to optimize both fidelity and discriminability. It links psychological, neuroscientific, and machine-learning perspectives, showing how hippocampal keys and neocortical values could support retrieval in a KV framework and how self-attention and kernel methods embody this approach. Through simulations and theoretical synthesis, the authors illustrate distinct key/value representations, retrieval-interference effects, and retrieval-based forgetting with potential recovery via reactivation. The work offers a unifying view that aligns brain-inspired memory architectures with modern AI systems, suggesting testable predictions about memory encoding, retrieval, and the reversible nature of certain memory failures.

Abstract

Classical models of memory in psychology and neuroscience rely on similarity-based retrieval of stored patterns, where similarity is a function of retrieval cues and the stored patterns. While parsimonious, these models do not allow distinct representations for storage and retrieval, despite their distinct computational demands. Key-value memory systems, in contrast, distinguish representations used for storage (values) and those used for retrieval (keys). This allows key-value memory systems to optimize simultaneously for fidelity in storage and discriminability in retrieval. We review the computational foundations of key-value memory, its role in modern machine learning systems, related ideas from psychology and neuroscience, applications to a number of empirical puzzles, and possible biological implementations.

Key-value memory in the brain

TL;DR

Abstract

Paper Structure (19 sections, 13 equations, 3 figures, 1 table)

This paper contains 19 sections, 13 equations, 3 figures, 1 table.

Introduction
Computational foundations of key-value memory
From correlations to kernels
Representational structure
The ubiquity of key-value memory
Neurobiological substrates
Evidence from psychology and neuroscience
Retrieval interference, not erasure, is the principal limiting factor in memory performance
Distinct representations of keys and values
Values, but not keys, are available for recall
Illustrative simulations
Distinct representations for keys and values
Forgetting as retrieval failure, and recovery by memory reactivation
Conclusions
Lead contact
...and 4 more sections

Figures (3)

Figure 1: Two architectures for key-value memory. Black symbols denote vectors and blue symbols denote matrices. (Left) Input $\mathbf{x}$ is mapped to key ($\mathbf{k}$), query ($\mathbf{q}$), and value ($\mathbf{v}$) vectors. During memory writing, the weight matrix $\mathbf{M}$ is updated using Hebbian learning between the key and value vectors. During reading, the query is projected onto $\mathbf{M}$ to produce a retrieved value $\hat{\mathbf{v}}$. (Right) The input vector is mapped to a hidden layer $\boldsymbol \alpha$, which is then mapped to an output layer $\hat{\mathbf{v}}$. The input-to-hidden weights correspond to the stored keys; the hidden-to-output weights correspond to the stored values.
Figure 2: Optimization of key and value representations. Each point represents an event in the memory and belongs to one of (A) two or (B) three classes, represented by different colors. In each case, the evolution of key (Top row) and value (Bottom row) representations during the optimization process is shown; each row shows (Left) Random initialization, (Middle) trajectory of representations during the optimization process, with the final positions marked by gray points, (Right) final configuration. We observe that keys are optimized for retrieval/separability, while values are optimized to store the memory content.
Figure 3: Forgetting and reactivation of memory events. A one-layer feedforward neural network is trained on two tasks sequentially, Task 1 and 2, constructed using the MNIST and FashionMNIST datasets, respectively. (A) The evolution of the test classification accuracy for the two tasks as a function of training epochs. After epoch 5, the training dataset changes from Task 1 to Task 2; resulting in forgetting of Task 1 as the model learns Task 2. (B) The accuracy of the trained model on Task 1 as a function of the value of the artificial scaler $\beta$ used to amplify the keys in all key-value memory pairs corresponding to Task 1 learning.

Key-value memory in the brain

TL;DR

Abstract

Key-value memory in the brain

Authors

TL;DR

Abstract

Table of Contents

Figures (3)