Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods

Hong Jeong

Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods

Hong Jeong

Abstract

Frozen encoder--decoder language models are stateless: the latent representation is discarded after every forward pass, so no information persists across sessions. This paper presents a \textbf{proof-of-concept pilot study} showing that persistent memory in the \emph{continuous latent space} of a frozen LLM is feasible -- even under severe resource constraints (a single frozen Flan-T5-XL backbone, small trainable adapters, a single dataset). We implement six architectural methods spanning three injection points and four write mechanisms; unlike text-level memory systems, every write and read is a differentiable operation on dense vectors. After training only the adapter, the memory bank continues to accumulate at inference time without gradients, enabling \emph{conversational learning}. Under a forgetting-curve evaluation on LoCoMo at two capacity scales (1$\times$ and 10$\times$), the stateless baseline scores exactly zero; at 10$\times$ all six trained adapters produce positive memory-recall curves; at 1$\times$ three methods collapse, revealing capacity as a critical design parameter. Because the memory bank is a compact numerical array, it can be scaled to arbitrarily large capacity without altering the backbone. We argue that full end-to-end training with larger models, larger data, and orders-of-magnitude larger memory will yield substantially stronger results; this pilot study establishes the feasibility baseline and design-space taxonomy that such efforts require.

Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods

Abstract

and 10

), the stateless baseline scores exactly zero; at 10

all six trained adapters produce positive memory-recall curves; at 1

three methods collapse, revealing capacity as a critical design parameter. Because the memory bank is a compact numerical array, it can be scaled to arbitrarily large capacity without altering the backbone. We argue that full end-to-end training with larger models, larger data, and orders-of-magnitude larger memory will yield substantially stronger results; this pilot study establishes the feasibility baseline and design-space taxonomy that such efforts require.

Paper Structure (41 sections, 26 equations, 9 figures, 5 tables)

This paper contains 41 sections, 26 equations, 9 figures, 5 tables.

Introduction
Related Work
Parameter-efficient adaptation.
Attention-coupled latent memory.
Cognitive memory systems.
Problem Setting and Notation
Trained Alternatives
M. 1: Memory as Encoder-Input Prefix
M. 2: Parallel Decoder Cross-Attention
M. 3: Decoder KV Extension
M. 4: Hebbian / Associative Memory
M. 5: Context-Gated Decoder Memory Branch
M. 6: Slot-Based Memory with Sparse Write
Training and Inference
Gradient flow through frozen networks.
...and 26 more sections

Figures (9)

Figure 1: Frozen encoder--decoder baseline used as the stateless control. The latent representation is consumed within the current turn and then discarded.
Figure 2: M. 1 injects persistent memory as an encoder-input prefix and writes the current latent back into memory through an attention-coupled update.
Figure 3: M. 2 preserves the frozen cross-attention route over the current encoder latent and adds a parallel decoder memory branch scaled by a learned coefficient.
Figure 4: M. 3 extends decoder keys and values with learned projections of persistent memory while keeping the current encoder tokens on their original path.
Figure 5: M. 4 stores associative structure in a Hebbian memory matrix that is queried by the current latent and passed to the decoder as additional memory.
...and 4 more figures

Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods

Abstract

Trained Persistent Memory for Frozen Encoder--Decoder LLMs: Six Architectural Methods

Authors

Abstract

Table of Contents

Figures (9)