Table of Contents
Fetching ...

Inhibitory Cross-Talk Enables Functional Lateralization in Attention-Coupled Latent Memory

Hong Jeong

TL;DR

The inhibitory model reduces cipher-domain loss by $124{\times}$ over the baseline while matching it on the arithmetic domain, confirming that persistent lateralized memory is necessary for episodic recall but not for rule-based prediction.

Abstract

We present a memory-augmented transformer in which attention serves simultaneously as a retrieval, consolidation, and write-back operator. The core update, $A^\top A V W$, re-grounds retrieved values into persistent memory slots via the Gram matrix $A^\top A$, providing a principled tripartite projection: observation space $\to$ latent memory $\to$ supervised transformation. We partition the memory into lateralized left and right banks coupled through a sign-controlled cross-talk matrix $W_s$, and show that the sign of this coupling is decisive for specialization. Excitatory cross-talk ($s=+1$) causes bank-dominance collapse: one bank monopolises all inputs and $\mathcal{P}_{ct} \to 0.5$, despite lowering task loss. Inhibitory cross-talk ($s=-1$), motivated by the net inhibitory effect of callosal projections in human cortex, actively suppresses contralateral bank activation and achieves saturated specialization ($\mathcal{D}_{sep} = \pm 1.00$, $\mathcal{P}_{ct} \approx 0$). On a controlled symbolic benchmark combining an episodic bijection cipher (requiring associative recall) with a strict arithmetic progression (requiring rule extraction), the inhibitory model reduces cipher-domain loss by $124{\times}$ over the baseline while matching it on the arithmetic domain, confirming that persistent lateralized memory is necessary for episodic recall but not for rule-based prediction.

Inhibitory Cross-Talk Enables Functional Lateralization in Attention-Coupled Latent Memory

TL;DR

The inhibitory model reduces cipher-domain loss by over the baseline while matching it on the arithmetic domain, confirming that persistent lateralized memory is necessary for episodic recall but not for rule-based prediction.

Abstract

We present a memory-augmented transformer in which attention serves simultaneously as a retrieval, consolidation, and write-back operator. The core update, , re-grounds retrieved values into persistent memory slots via the Gram matrix , providing a principled tripartite projection: observation space latent memory supervised transformation. We partition the memory into lateralized left and right banks coupled through a sign-controlled cross-talk matrix , and show that the sign of this coupling is decisive for specialization. Excitatory cross-talk () causes bank-dominance collapse: one bank monopolises all inputs and , despite lowering task loss. Inhibitory cross-talk (), motivated by the net inhibitory effect of callosal projections in human cortex, actively suppresses contralateral bank activation and achieves saturated specialization (, ). On a controlled symbolic benchmark combining an episodic bijection cipher (requiring associative recall) with a strict arithmetic progression (requiring rule extraction), the inhibitory model reduces cipher-domain loss by over the baseline while matching it on the arithmetic domain, confirming that persistent lateralized memory is necessary for episodic recall but not for rule-based prediction.
Paper Structure (15 sections, 11 equations, 3 figures, 3 tables)

This paper contains 15 sections, 11 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The tripartite projection sequence mapping latent values to the observation space and back, before applying the supervised transformation.
  • Figure 2: Bidirectional cross-talk pathways for lateralized memory (inhibitory mode, $s=-1$, proposed). The dashed red lines show the paths where one bank's values reach the contralateral update. Under inhibitory cross-talk, these paths carry a negative coefficient: the right bank's values $V_r$suppress the left bank's update (and vice versa), sharpening bank separation rather than sharing representations. Setting $s=+1$ reverses these signs to excitatory; setting the cross-weights to zero gives the split-brain baseline.
  • Figure 3: Training convergence of the Attention-Coupled Lateral model over 50 epochs. Panel 1 (task loss): cross-entropy drops from 1.12 to 0.03. Panel 2 (total loss): routing auxiliary term drives total loss slightly negative once routing saturates. Panel 3 ($\mathcal{D}_{sep}$): Separation Degree converges after brief instabilities at epochs 15 and 33. Panel 4 ($\mathcal{P}_{ct}$): Cross-Talk Penalty collapses to $\approx 0$ within 4 epochs and stays there, confirming stable lateralized routing throughout training.