Table of Contents
Fetching ...

The Coordinate System Problem in Persistent Structural Memory for Neural Architectures

Abhinaba Basu

Abstract

We introduce the Dual-View Pheromone Pathway Network (DPPN), an architecture that routes sparse attention through a persistent pheromone field over latent slot transitions, and use it to discover two independent requirements for persistent structural memory in neural networks. Through five progressively refined experiments using up to 10 seeds per condition across 5 model variants and 4 transfer targets, we identify a core principle: persistent memory requires a stable coordinate system, and any coordinate system learned jointly with the model is inherently unstable. We characterize three obstacles -- pheromone saturation, surface-structure entanglement, and coordinate incompatibility -- and show that neither contrastive updates, multi-source distillation, Hungarian alignment, nor semantic decomposition resolves the instability when embeddings are learned from scratch. Fixed random Fourier features provide extrinsic coordinates that are stable, structure-blind, and informative, but coordinate stability alone is insufficient: routing-bias pheromone does not transfer (10 seeds, p>0.05). DPPN outperforms transformer and random sparse baselines for within-task learning (AULC 0.700 vs 0.680 vs 0.670). Replacing routing bias with learning-rate modulation eliminates negative transfer: warm pheromone as a learning-rate prior achieves +0.003 on same-family tasks (17 seeds, p<0.05) while never reducing performance. A structure completion function over extrinsic coordinates produces +0.006 same-family bonus beyond regularization, showing the catch-22 between stability and informativeness is partially permeable to learned functions. The contribution is two independent requirements for persistent structural memory: (a) coordinate stability and (b) graceful transfer mechanism.

The Coordinate System Problem in Persistent Structural Memory for Neural Architectures

Abstract

We introduce the Dual-View Pheromone Pathway Network (DPPN), an architecture that routes sparse attention through a persistent pheromone field over latent slot transitions, and use it to discover two independent requirements for persistent structural memory in neural networks. Through five progressively refined experiments using up to 10 seeds per condition across 5 model variants and 4 transfer targets, we identify a core principle: persistent memory requires a stable coordinate system, and any coordinate system learned jointly with the model is inherently unstable. We characterize three obstacles -- pheromone saturation, surface-structure entanglement, and coordinate incompatibility -- and show that neither contrastive updates, multi-source distillation, Hungarian alignment, nor semantic decomposition resolves the instability when embeddings are learned from scratch. Fixed random Fourier features provide extrinsic coordinates that are stable, structure-blind, and informative, but coordinate stability alone is insufficient: routing-bias pheromone does not transfer (10 seeds, p>0.05). DPPN outperforms transformer and random sparse baselines for within-task learning (AULC 0.700 vs 0.680 vs 0.670). Replacing routing bias with learning-rate modulation eliminates negative transfer: warm pheromone as a learning-rate prior achieves +0.003 on same-family tasks (17 seeds, p<0.05) while never reducing performance. A structure completion function over extrinsic coordinates produces +0.006 same-family bonus beyond regularization, showing the catch-22 between stability and informativeness is partially permeable to learned functions. The contribution is two independent requirements for persistent structural memory: (a) coordinate stability and (b) graceful transfer mechanism.
Paper Structure (90 sections, 13 equations, 5 figures, 10 tables)

This paper contains 90 sections, 13 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: (a) DPPN architecture schematic. Tokens are embedded and passed through dual soft groupers to produce slot assignments. Slot-level support and agreement are combined with the persistent pheromone field (highlighted) to produce pheromone-biased routing, which generates a sparse attention mask. A fast/slow gate fusion produces the final output. (b) Transfer protocol. In Phase 1, the model is trained on a source task and pheromone accumulates structural memory. In Phase 2, all model weights are reset but pheromone is either kept (warm) or reset (cold), and the model is trained on a target task.
  • Figure 2: Pheromone field $\boldsymbol{\tau} \in \mathbb{R}^{32 \times 32}$ before and after source training (DPPN, seed 42). (a) At initialization, pheromone is near-uniform ($\bar{\tau} \approx 1.05$). (b) After 80 epochs, the field is sparse and structured: most transitions have decayed to $\tau_{\min} = 0.1$ (blue), with a small number of high-pheromone transitions at $\tau_{\max} = 2.0$ (red).
  • Figure 3: Transfer advantage ($\Delta$ AULC: warm distilled $-$ cold) across model variants and transfer targets. Same-family tasks (A2, A3) are shaded green; different-family tasks (B1, C1) are shaded red. With 3 seeds, the Position-Only Fourier variant appeared to show positive transfer on A2 ($+0.003$), but with 10 seeds the advantage is $-0.001 \pm 0.005$ (not significant). All routing-bias variants show uniformly negative or zero transfer advantages. Error bars: std over seeds.
  • Figure 4: The diagnostic cascade. Five experiments, each resolving one obstacle (left, green) while revealing the next (right, red). The center column shows the key metric from each experiment. All obstacles trace to the same root cause: persistent memory requires stable coordinates, and learned coordinates are inherently unstable.
  • Figure 5: Coordinate stability diagnostics. (a) Slot alignment correlation between independently trained DPPN models: 3.5%, barely above the 3.1% expected by random chance with 32 slots. (b) Distillation survival: the number of high-magnitude transitions (out of 1024) surviving element-wise minimum distillation. Position-Only Fourier preserves the most transitions (22), consistent with better coordinate stability.