Pay Attention Later: From Vector Space Diffusion to Linearithmic Spectral Phase-Locking
Alper Yıldırım, İbrahim Yücedağ
TL;DR
The paper argues that standard Transformers pay a Semantic Alignment Tax due to diffusive optimization over a chaotic initial map, hindering rapid integration of new concepts. It proposes Iterative Semantic Map Refinement (ISMR) to isolate geometry and introduces PRISM, a harmonic architecture that encodes semantic identity as resonant frequencies and uses Gated Harmonic Convolutions with FFT-based global interactions. Empirical results on WMT14 show that, while Transformers excel in general translation, PRISM delivers superior few-shot plasticity with minimal forgetting, effectively decoupling memory from reasoning. The work suggests a shift toward harmonic representations to overcome plasticity-stability trade-offs and outlines concrete future directions, including H-LoRA and H-SSM, for scalable spectral architectures.
Abstract
Standard Transformers suffer from a "Semantic Alignment Tax", a prohibitive optimization cost required to organize a chaotic initialization into a coherent geometric map via local gradient diffusion. We hypothesize that this reliance on diffusive learning creates "Catastrophic Rigidity", rendering models unable to adapt to novel concepts without destroying their pre-trained reasoning capabilities. To isolate this phenomenon, we introduce Iterative Semantic Map Refinement (ISMR), a diagnostic protocol revealing that alignment is a fixed geometric barrier that scaling cannot solve; a 20-layer model overcomes this barrier no faster than a 1-layer model. We introduce the Phase-Resonant Intelligent Spectral Model (PRISM). PRISM encodes semantic identity as resonant frequencies in the complex domain (C^d) and replaces quadratic self-attention with linearithmic O(N log N) Gated Harmonic Convolutions. We validate PRISM on the WMT14 translation task. While the Standard Transformer maintains a slight edge in general competence on static benchmarks (23.88 vs 21.40 BLEU), it fails the "Plasticity-Stability" stress test completely. When injected with novel concepts, the Transformer suffers Catastrophic Forgetting, degrading by -10.55 BLEU points while achieving only 60% acquisition. In contrast, PRISM demonstrates Lossless Plasticity, achieving 96% 5-shot acquisition with negligible degradation (-0.84 BLEU). These results suggest that harmonic representations effectively decouple memory from reasoning, offering a structural solution to the plasticity-stability dilemma in real-time knowledge adaptation.
