Table of Contents
Fetching ...

Facts as First Class Objects: Knowledge Objects for Persistent LLM Memory

Oliver Zahn, Simran Chana

Abstract

Large language models increasingly serve as persistent knowledge workers, with in-context memory - facts stored in the prompt - as the default strategy. We benchmark in-context memory against Knowledge Objects (KOs), discrete hash-addressed tuples with O(1) retrieval. Within the context window, Claude Sonnet 4.5 achieves 100% exact-match accuracy from 10 to 7,000 facts (97.5% of its 200K window). However, production deployment reveals three failure modes: capacity limits (prompts overflow at 8,000 facts), compaction loss (summarization destroys 60% of facts), and goal drift (cascading compaction erodes 54% of project constraints while the model continues with full confidence). KOs achieve 100% accuracy across all conditions at 252x lower cost. On multi-hop reasoning, KOs reach 78.9% versus 31.6% for in-context. Cross-model replication across four frontier models confirms compaction loss is architectural, not model-specific. We additionally show that embedding retrieval fails on adversarial facts (20% precision at 1) and that neural memory (Titans) stores facts but fails to retrieve them on demand. We introduce density-adaptive retrieval as a switching mechanism and release the benchmark suite.

Facts as First Class Objects: Knowledge Objects for Persistent LLM Memory

Abstract

Large language models increasingly serve as persistent knowledge workers, with in-context memory - facts stored in the prompt - as the default strategy. We benchmark in-context memory against Knowledge Objects (KOs), discrete hash-addressed tuples with O(1) retrieval. Within the context window, Claude Sonnet 4.5 achieves 100% exact-match accuracy from 10 to 7,000 facts (97.5% of its 200K window). However, production deployment reveals three failure modes: capacity limits (prompts overflow at 8,000 facts), compaction loss (summarization destroys 60% of facts), and goal drift (cascading compaction erodes 54% of project constraints while the model continues with full confidence). KOs achieve 100% accuracy across all conditions at 252x lower cost. On multi-hop reasoning, KOs reach 78.9% versus 31.6% for in-context. Cross-model replication across four frontier models confirms compaction loss is architectural, not model-specific. We additionally show that embedding retrieval fails on adversarial facts (20% precision at 1) and that neural memory (Titans) stores facts but fails to retrieve them on demand. We introduce density-adaptive retrieval as a switching mechanism and release the benchmark suite.
Paper Structure (58 sections, 5 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 58 sections, 5 equations, 7 figures, 12 tables, 1 algorithm.

Figures (7)

  • Figure 1: Scaling curve: exact-match accuracy vs. corpus size ($N{=}10$ to $N{=}10{,}000$). Claude Sonnet 4.5 maintains 100% accuracy through $N{=}7{,}000$ (97.5% of context window), then overflows. GPT-4o drops to 0% by $N{=}3{,}000$. KO maintains 100% at all $N$. The dashed line marks the 200K token context window boundary.
  • Figure 2: Fact retrieval accuracy after 36.7$\times$ compaction. In-context memory loses 60% of facts; KO maintains 100%.
  • Figure 3: Goal drift under cascading compaction. Left: Stacked bars showing correct, partial, and lost constraints after each round. Right: Accuracy decay vs. compression ratio. KO maintains 100% regardless of compaction.
  • Figure 4: Multi-hop reasoning accuracy on 2-hop queries over a 500-fact corpus. KO-grounded retrieval achieves 78.9% accuracy, a 47.3 percentage point improvement over full in-context presentation (31.6%).
  • Figure 5: Cross-domain synthesis quality scores (1--5 scale) across four dimensions. The largest improvement is in groundedness (+118%), where KO retrieval enables claims traceable to specific stored facts.
  • ...and 2 more figures