Trained Persistent Memory for Frozen Decoder-Only LLMs

Hong Jeong

Trained Persistent Memory for Frozen Decoder-Only LLMs

Hong Jeong

Abstract

Decoder-only language models are stateless: hidden representations are discarded after every forward pass and nothing persists across sessions. Jeong (2026a) showed that trained memory adapters give a frozen encoder-decoder backbone persistent latent-space memory, building on the lateral-memory framework of Jeong (2026b,c). Here we ask whether the same principle transfers to the decoder-only setting, where no cross-attention pathway exists and memory must enter through self-attention alone. We adapt six methods -- prefix, parallel cross-attention, KV extension, Hebbian memory, context-gated branch, and slot-based sparse write -- to a frozen GPT-2, training only a small adapter $θ_{mem}$. The write rule is shared; only the read injection changes from decoder cross-attention to self-attention KV prefix or parallel branch. On LoCoMo we find a striking inductive-bias dichotomy: at $1\times$ capacity, three methods with strong architectural priors -- cross-attention (M.2), Hebbian (M.4), and slot write (M.6) -- achieve retained-memory scores of $7-18\%$ and knowledge gains $ΔK$ of $7-10$, while the other three fail ($< 0.4\%$). At $10\times$ capacity all six converge, showing the gap is architectural, not fundamental. Together with the encoder-decoder results of Jeong (2026a) and the brain-inspired modules of Jeong (2026b,c), these findings establish persistent latent-space memory as a general paradigm spanning major transformer families.

Trained Persistent Memory for Frozen Decoder-Only LLMs

Abstract

. The write rule is shared; only the read injection changes from decoder cross-attention to self-attention KV prefix or parallel branch. On LoCoMo we find a striking inductive-bias dichotomy: at

capacity, three methods with strong architectural priors -- cross-attention (M.2), Hebbian (M.4), and slot write (M.6) -- achieve retained-memory scores of

and knowledge gains

, while the other three fail (

). At

capacity all six converge, showing the gap is architectural, not fundamental. Together with the encoder-decoder results of Jeong (2026a) and the brain-inspired modules of Jeong (2026b,c), these findings establish persistent latent-space memory as a general paradigm spanning major transformer families.

Paper Structure (47 sections, 17 equations, 12 figures, 4 tables)

This paper contains 47 sections, 17 equations, 12 figures, 4 tables.

Introduction
Related Work
Persistent memory for LLMs.
KV caching and context extension.
Recurrent and compressive context extension.
Learned external memory.
Parameter-efficient adaptation.
Attention-coupled latent memory.
Problem Setting
Stateless decoder-only baseline
Adding persistent memory
The decoder-only challenge: no cross-attention
Adapting the Six Methods to Decoder-Only Models
Shared write rule
M.1: Self-Attention KV Prefix
...and 32 more sections

Figures (12)

Figure 1: Frozen decoder-only baseline. The hidden state $H_t$ is consumed within the current turn and then discarded; no information persists across sessions.
Figure 2: Three decoder-only injection strategies. (a) KV prefix prepends memory-derived keys and values to the self-attention cache. (b) Parallel cross-attention inserts a new attention pathway alongside the existing self-attention. (c) Gated branch computes a memory readout and adds it through a learned content-dependent gate.
Figure 3: M.1: Memory is projected into soft key--value pairs and prepended to the self-attention cache at every layer. The write rule updates $P$ from $H_t$.
Figure 4: M.2: A parallel cross-attention layer is inserted after each frozen self-attention block. The scaling factor $\beta^{(\ell)}$ is initialised to zero for safe startup.
Figure 5: M.3: Memory is projected into per-layer key--value pairs and concatenated with the frozen self-attention cache. Unlike M.1, both the read projections and the write update are layer-specific.
...and 7 more figures

Trained Persistent Memory for Frozen Decoder-Only LLMs

Abstract

Trained Persistent Memory for Frozen Decoder-Only LLMs

Authors

Abstract

Table of Contents

Figures (12)