Table of Contents
Fetching ...

The Price of Meaning: Why Every Semantic Memory System Forgets

Sambartha Ray Barman, Andrey Starenky, Sofia Bodnar, Nikhil Narasimhan, Ashwin Gopinath

Abstract

Every major AI memory system in production today organises information by meaning. That organisation enables generalisation, analogy, and conceptual retrieval -- but it comes at a price. We prove that the same geometric structure enabling semantic generalisation makes interference, forgetting, and false recall inescapable. We formalise this tradeoff for \textit{semantically continuous kernel-threshold memories}: systems whose retrieval score is a monotone function of an inner product in a semantic feature space with finite local intrinsic dimension. Within this class we derive four results: (1) semantically useful representations have finite effective rank; (2) finite local dimension implies positive competitor mass in retrieval neighbourhoods; (3) under growing memory, retention decays to zero, yielding power-law forgetting curves under power-law arrival statistics; (4) for associative lures satisfying a $δ$-convexity condition, false recall cannot be eliminated by threshold tuning. We test these predictions across five architectures: vector retrieval, graph memory, attention-based context, BM25 filesystem retrieval, and parametric memory. Pure semantic systems express the vulnerability directly as forgetting and false recall. Reasoning-augmented systems partially override these symptoms but convert graceful degradation into catastrophic failure. Systems that escape interference entirely do so by sacrificing semantic generalisation. The price of meaning is interference, and no architecture we tested avoids paying it.

The Price of Meaning: Why Every Semantic Memory System Forgets

Abstract

Every major AI memory system in production today organises information by meaning. That organisation enables generalisation, analogy, and conceptual retrieval -- but it comes at a price. We prove that the same geometric structure enabling semantic generalisation makes interference, forgetting, and false recall inescapable. We formalise this tradeoff for \textit{semantically continuous kernel-threshold memories}: systems whose retrieval score is a monotone function of an inner product in a semantic feature space with finite local intrinsic dimension. Within this class we derive four results: (1) semantically useful representations have finite effective rank; (2) finite local dimension implies positive competitor mass in retrieval neighbourhoods; (3) under growing memory, retention decays to zero, yielding power-law forgetting curves under power-law arrival statistics; (4) for associative lures satisfying a -convexity condition, false recall cannot be eliminated by threshold tuning. We test these predictions across five architectures: vector retrieval, graph memory, attention-based context, BM25 filesystem retrieval, and parametric memory. Pure semantic systems express the vulnerability directly as forgetting and false recall. Reasoning-augmented systems partially override these symptoms but convert graceful degradation into catastrophic failure. Systems that escape interference entirely do so by sacrificing semantic generalisation. The price of meaning is interference, and no architecture we tested avoids paying it.

Paper Structure

This paper contains 28 sections, 7 theorems, 17 figures, 5 tables.

Key Result

Theorem 1

Let $K$ be the semantic kernel with Mercer eigenpairs $(\lambda_j, \psi_j)$. Under Axioms A1--A3, for every optimal encoder under distortion budget $D$, there exists a threshold $\gamma(D)$ such that the encoder factors through the truncated semantic statistic $\Phi_\gamma(x) = (\sqrt{\lambda_j}\psi

Figures (17)

  • Figure 1: The No-Escape Theorem: logical structure (paper roadmap). This figure maps the paper's argument. Under the kernel-threshold theorem class (Axioms A1--A5): the semantic kernel and rate-distortion optimality yield finite semantic effective rank (Theorem 1); local regularity yields positive cap mass (Theorem 2); growing memory yields inevitable forgetting (Theorem 3), with power-law arrival and population heterogeneity producing power-law forgetting curves. Independently, associative $\delta$-convexity yields lure inseparability (Theorem 4). No architecture within this class avoids these consequences without abandoning semantic continuity or adding an external symbolic verifier. Each arrow represents a step derived under stated assumptions and supported by empirical tests across the architectures studied here.
  • Figure 2: Interference produces forgetting across architecturally distinct memory systems.a, Vector DB and b, Graph show smooth power-law forgetting curves converging toward the human range ($b \approx 0.3$--$0.7$, red dashed). c, Attention shows a phase transition (logistic fit: $n_0 \approx 120$, $k \approx 0.03$; power-law fitting is inappropriate for this sigmoid failure mode). d, Filesystem (BM25) shows $b = 0$ (no semantic interference). e, Parametric (PopQA) shows monotonic accuracy decline with neighbour density. Category 1 systems degrade continuously; Category 2 systems fail discontinuously. $n = 5$ seeds throughout.
  • Figure 3: The forgetting exponent depends on competitor count, not architecture. Forgetting exponent $b$ vs. number of near competitors for embedding architectures (Vector DB, Graph) with human reference ($b \approx 0.5$, dashed). Both converge toward the human range at high competitor counts. Shaded: bootstrap $95\%$ CI, $n = 5$ seeds.
  • Figure 4: False recall is geometrically inevitable.a, Hit rate, lure false alarm rate, and unrelated FA for all five architectures and human data. Embedding architectures show elevated lure FA; LLM architectures show FA $= 0$ at behavioural level (explicit list-checking). b, Lure FA rates compared directly. The geometric prediction ($24/24$ lures within spherical caps) holds for all architectures regardless of behavioural output. $n = 5$ seeds, $24$ DRM lists.
  • Figure 5: Effective dimensionality converges far below nominal regardless of architecture.$d_\text{eff}$ (participation ratio) vs. $d_\text{nom}$ for all five architectures. Grey: biological range ($d_\text{eff} = 100$--$500$stringer2019gao2017). Qwen hidden states ($d_\text{nom} = 3{,}584$) compress to $d_\text{eff} = 17.9$, a $200\times$ reduction. All architectures cluster below the interference threshold.
  • ...and 12 more figures

Theorems & Definitions (13)

  • Definition 1: Semantic Proximity Property
  • Definition 2: Axiom A1: Kernel-Threshold Retrieval
  • Definition 3: Axiom A2: Semantic Sufficiency
  • Definition 4: Axiom A3: Rate-Distortion Optimality
  • Definition 5: Axiom A4: Local Regularity
  • Definition 6: Axiom A5: Associative Convexity
  • Theorem 1: Semantic Spectral Bound; proof sketch
  • Theorem 2: Positive Cap Mass
  • Theorem 3: Inevitable Forgetting Under Growing Memory
  • Corollary 4: Stretched Exponential Per-Item Retention
  • ...and 3 more