Table of Contents
Fetching ...

A Miniature Brain Transformer: Thalamic Gating, Hippocampal Lateralization, Amygdaloid Salience, and Prefrontal Working Memory in Attention-Coupled Latent Memory

Hong Jeong

TL;DR

A miniature brain transformer architecture that extends the attention-coupled latent memory framework with four additional brain-region analogues, coupled by inhibitory callosal cross-talk between lateralized hippocampal banks, constitutes a novel, falsifiable prediction -- no lateralization without working memory context -- and a principled, neurobiologically motivated blueprint for hierarchical persistent memory in sequence models.

Abstract

We present a miniature brain transformer architecture that extends the attention-coupled latent memory framework with four additional brain-region analogues: a thalamic relay, an amygdaloid salience module, a prefrontal working-memory (PFC) buffer, and a cerebellar fast-path, all coupled by inhibitory callosal cross-talk between lateralized hippocampal banks. We evaluate on a two-domain benchmark -- MQAR (Multi-Query Associative Recall; episodic domain) and modular arithmetic (+1 mod 10; rule-based domain) -- using a seven-variant additive ablation. The central empirical finding is a surprise: inhibitory callosal coupling alone never lateralizes the banks (variants 1-5 maintain D_sep ~ 0.25 and P_ct ~ 0.25 for all 30 epochs). Functional lateralization requires the synergy of PFC and inhibition: only when the PFC buffer is added (variant 6) does a sharp, discontinuous phase transition fire -- at epoch 11 for the PFC-only variant and epoch 10 for the full model -- collapsing P_ct from 0.25 to ~0.002 and more than doubling D_sep from 0.251 to 0.501 in a single gradient step. The PFC buffer acts as a symmetry-breaker: its slowly drifting domain context creates the initial asymmetry that the inhibitory feedback loop then amplifies irreversibly. The cerebellar fast-path accelerates the transition by one epoch (epoch 10 vs. epoch 11) with no asymptotic change, confirming its convergence-acceleration role. The result constitutes a novel, falsifiable prediction -- no lateralization without working memory context -- and a principled, neurobiologically motivated blueprint for hierarchical persistent memory in sequence models.

A Miniature Brain Transformer: Thalamic Gating, Hippocampal Lateralization, Amygdaloid Salience, and Prefrontal Working Memory in Attention-Coupled Latent Memory

TL;DR

A miniature brain transformer architecture that extends the attention-coupled latent memory framework with four additional brain-region analogues, coupled by inhibitory callosal cross-talk between lateralized hippocampal banks, constitutes a novel, falsifiable prediction -- no lateralization without working memory context -- and a principled, neurobiologically motivated blueprint for hierarchical persistent memory in sequence models.

Abstract

We present a miniature brain transformer architecture that extends the attention-coupled latent memory framework with four additional brain-region analogues: a thalamic relay, an amygdaloid salience module, a prefrontal working-memory (PFC) buffer, and a cerebellar fast-path, all coupled by inhibitory callosal cross-talk between lateralized hippocampal banks. We evaluate on a two-domain benchmark -- MQAR (Multi-Query Associative Recall; episodic domain) and modular arithmetic (+1 mod 10; rule-based domain) -- using a seven-variant additive ablation. The central empirical finding is a surprise: inhibitory callosal coupling alone never lateralizes the banks (variants 1-5 maintain D_sep ~ 0.25 and P_ct ~ 0.25 for all 30 epochs). Functional lateralization requires the synergy of PFC and inhibition: only when the PFC buffer is added (variant 6) does a sharp, discontinuous phase transition fire -- at epoch 11 for the PFC-only variant and epoch 10 for the full model -- collapsing P_ct from 0.25 to ~0.002 and more than doubling D_sep from 0.251 to 0.501 in a single gradient step. The PFC buffer acts as a symmetry-breaker: its slowly drifting domain context creates the initial asymmetry that the inhibitory feedback loop then amplifies irreversibly. The cerebellar fast-path accelerates the transition by one epoch (epoch 10 vs. epoch 11) with no asymptotic change, confirming its convergence-acceleration role. The result constitutes a novel, falsifiable prediction -- no lateralization without working memory context -- and a principled, neurobiologically motivated blueprint for hierarchical persistent memory in sequence models.
Paper Structure (63 sections, 14 equations, 3 figures, 6 tables)

This paper contains 63 sections, 14 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Standard transformer vs. Brain transformer architecture.(a) A conventional deep transformer encodes the entire relevant context in a long input sequence on every forward pass; the resulting activations are volatile and are discarded at the end of each call. (b) Our brain-inspired architecture offloads long-term associative storage into persistent, lateralized hippocampal memory banks. The encoder itself can therefore remain thin, processing only a short prompt at inference time, while the memory banks accumulate a "big persistent brain" that survives across forward passes. Both variants are trained end-to-end with standard supervised learning.
  • Figure 2: The miniature brain architecture. Five neuroscientifically motivated modules are connected through the $A^\top\!AVW$ write-back operator. Solid arrows show the primary information flow; dashed red arrows show callosal inhibitory cross-talk; dotted blue arrows show modulatory signals.
  • Figure 3: Lateralization dynamics across all seven ablation variants.(a) Lateralization map: colour encodes $\mathcal{D}_{sep}$ as a function of training epoch (x-axis) and variant (y-axis). All five variants without a PFC buffer (top rows) remain at the uniform equilibrium (blue) throughout training. Only +PFC (variant 6) and Full (variant 7) break symmetry at epochs 11 and 10, respectively (dashed white lines). (b) Bifurcation curves: $\mathcal{D}_{sep}$ over time. Dotted vertical lines mark the transition epoch; the grey band shows the unlateralized plateau ($\mathcal{D}_{sep}\approx 0.25$). (c) Cross-talk penalty $\mathcal{P}_{ct}$ (inverted; lower is better); mirror of panel (b). (d) MQAR recall accuracy, confirming that task performance does not diverge across variants despite the routing bifurcation.