Table of Contents
Fetching ...

Persistent Memory Through Triple-Loop Consolidation in a Non-Gradient Dissipative Cognitive Architecture

Jianwei Lou

Abstract

Dissipative cognitive architectures maintain computation through continuous energy expenditure, where units that exhaust their energy are stochastically replaced with fresh random state. This creates a fundamental challenge: how can persistent, context-specific memory survive when all learnable state is periodically destroyed? Existing memory mechanisms -- including elastic weight consolidation, synaptic intelligence, and surprise-driven gating -- rely on gradient computation and are inapplicable to non-gradient dissipative systems. We introduce Deep Memory (DM), a non-gradient persistent memory mechanism operating through a triple-loop consolidation cycle: (1) recording of expert-specific content centroids, (2) seeding of replaced units with stored representations, and (3) stabilization through continuous re-entry. We demonstrate that discrete expert routing via Mixture-of-Experts (MoE) gating is a causal prerequisite for DM, preventing centroid convergence that would render stored memories identical. Across ${\sim}970$ simulation runs spanning thirteen experimental blocks: (i) discrete routing is causally necessary for specialization ($\text{MI}=1.10$ vs. $0.001$; $n=91$); (ii) DM achieves $R=0.984$ vs. $0.385$ without memory ($n=16$); (iii) continuous seeding reconstructs representations after interference ($R_\mathrm{recon}=0.978$; one-shot fails; $n=30$); (iv) the mechanism operates within a characterized $(K,p)$ envelope ($n=350$); (v) recording $\times$ seeding is the minimal critical dyad ($n=40$); (vi) DM outperforms non-gradient baselines (Hopfield, ESN) under matched turnover ($n=370$). These results establish DM as a falsifiable mechanism for persistent memory in non-gradient cognitive systems, with functional parallels to hippocampal consolidation.

Persistent Memory Through Triple-Loop Consolidation in a Non-Gradient Dissipative Cognitive Architecture

Abstract

Dissipative cognitive architectures maintain computation through continuous energy expenditure, where units that exhaust their energy are stochastically replaced with fresh random state. This creates a fundamental challenge: how can persistent, context-specific memory survive when all learnable state is periodically destroyed? Existing memory mechanisms -- including elastic weight consolidation, synaptic intelligence, and surprise-driven gating -- rely on gradient computation and are inapplicable to non-gradient dissipative systems. We introduce Deep Memory (DM), a non-gradient persistent memory mechanism operating through a triple-loop consolidation cycle: (1) recording of expert-specific content centroids, (2) seeding of replaced units with stored representations, and (3) stabilization through continuous re-entry. We demonstrate that discrete expert routing via Mixture-of-Experts (MoE) gating is a causal prerequisite for DM, preventing centroid convergence that would render stored memories identical. Across simulation runs spanning thirteen experimental blocks: (i) discrete routing is causally necessary for specialization ( vs. ; ); (ii) DM achieves vs. without memory (); (iii) continuous seeding reconstructs representations after interference (; one-shot fails; ); (iv) the mechanism operates within a characterized envelope (); (v) recording seeding is the minimal critical dyad (); (vi) DM outperforms non-gradient baselines (Hopfield, ESN) under matched turnover (). These results establish DM as a falsifiable mechanism for persistent memory in non-gradient cognitive systems, with functional parallels to hippocampal consolidation.

Paper Structure

This paper contains 41 sections, 3 theorems, 3 equations, 7 figures, 3 tables, 1 algorithm.

Key Result

Proposition 2

Consider $N$ units with content vectors $\{z_i\}_{i=1}^N$ updated via the exponential moving average rule $z_i \leftarrow (1 - \alpha)\,z_i + \alpha\,\bar{x}_i$ across $K$ contexts with centroids $\{c_k\}_{k=1}^K$. Under Uniform Activation (Definition def:ua), $\lim_{t \to \infty} \operatorname{Var}

Figures (7)

  • Figure 1: Schematic of the non-gradient cognitive cycle with triple-loop consolidation. Content activations in the grid are routed to one of $K$ experts; Deep Memory records expert-specific centroids (Loop I), reseeds replaced units (Loop II), and continuously stabilizes representations against dissipative drift (Loop III). Stochastic turnover periodically replaces units, creating the persistence challenge that the triple-loop addresses.
  • Figure 2: Discrete routing is necessary for structural specialization. (A) Firing selectivity: Full MoE achieves near-perfect context-exclusive firing ($f_{\text{sel}} = 0.959$); binding disruption collapses to baseline. (B) Structural separation via silhouette score. (C) Mutual information between context and expert assignment: Full MoE achieves the theoretical maximum ($\ln K$); per-cycle random permutation eliminates binding ($\text{MI} = 0.001$). $n = 91$ runs, 14 seeds.
  • Figure 3: Deep Memory creates persistent representations. Full DM with correct expert mapping achieves $R = 0.984$, consistent across all contexts. Global control ($R = 0.385$) lacks context specificity. Mismatched write ($R \approx 0$) produces catastrophic failure; noise write ($R = 0.935$) degrades gracefully. $n = 16$ runs, 4 seeds.
  • Figure 4: Functional reconstruction requires continuous seeding and expert--content alignment. Continuous seeding achieves near-complete recovery ($R_{\text{recon}} = 0.978$); one-shot seeding fails ($R_{\text{recon}} = 0.305 \approx$ baseline). Noise seeding recovers partially ($0.712$; content-specificity gap $= 0.266$). Wrong-expert seeding is harmful ($0.125 <$ baseline). $n = 30$ runs, 5 seeds.
  • Figure 5: Operating envelope across expert count $K$, injection rate $\rho$, and block size. Blue: Pass (7/10); orange: Degraded (2/10); red: Fail (1/10). Low $p$ and high $K$ define the degradation boundary. The threshold line at $R = 0.70$ separates Pass from sub-threshold regimes. $n = 115$ runs, 5 seeds.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Definition 1: Uniform Activation (UA)
  • Proposition 2: Grand-Centroid Collapse
  • proof : Proof sketch
  • Proposition 3
  • Proposition 4: DM Seeding Maintains Effective Routing Under Turnover