Table of Contents
Fetching ...

HTM-EAR: Importance-Preserving Tiered Memory with Hybrid Routing under Saturation

Shubham Kumar Singh

TL;DR

HTM-EAR is introduced, a hierarchical tiered memory substrate that integrates HNSW-based working memory with archival storage (L1) with archival storage (L2), combining importance-aware eviction and hybrid routing, and under saturation, the full model preserves active-query precision.

Abstract

Memory constraints in long-running agents require structured management of accumulated facts while preserving essential information under bounded context limits. We introduce HTM-EAR, a hierarchical tiered memory substrate that integrates HNSW-based working memory (L1) with archival storage (L2), combining importance-aware eviction and hybrid routing. When L1 reaches capacity, items are evicted using a weighted score of importance and usage. Queries are first resolved in L1; if similarity or entity coverage is insufficient, retrieval falls back to L2, and candidates are re-ranked using a cross-encoder. We evaluate the system under sustained saturation (15,000 facts; L1 capacity 500; L2 capacity 5000) using synthetic streams across five random seeds and real BGL system logs. Ablation studies compare the full system against variants without cross-encoder re-ranking, without routing gates, with LRU eviction, and an oracle with unbounded memory. Under saturation, the full model preserves active-query precision (MRR = 1.000) while enabling controlled forgetting of stale history, approaching oracle active performance (0.997 +/- 0.003). In contrast, LRU minimizes latency (21.1 ms) but permanently evicts 2416 essential facts. On BGL logs, the full system achieves MRR 0.336, close to the oracle (0.370), while LRU drops to 0.069. Code is publicly available at: https://github.com/shubham-61291/HTM-EAR

HTM-EAR: Importance-Preserving Tiered Memory with Hybrid Routing under Saturation

TL;DR

HTM-EAR is introduced, a hierarchical tiered memory substrate that integrates HNSW-based working memory with archival storage (L1) with archival storage (L2), combining importance-aware eviction and hybrid routing, and under saturation, the full model preserves active-query precision.

Abstract

Memory constraints in long-running agents require structured management of accumulated facts while preserving essential information under bounded context limits. We introduce HTM-EAR, a hierarchical tiered memory substrate that integrates HNSW-based working memory (L1) with archival storage (L2), combining importance-aware eviction and hybrid routing. When L1 reaches capacity, items are evicted using a weighted score of importance and usage. Queries are first resolved in L1; if similarity or entity coverage is insufficient, retrieval falls back to L2, and candidates are re-ranked using a cross-encoder. We evaluate the system under sustained saturation (15,000 facts; L1 capacity 500; L2 capacity 5000) using synthetic streams across five random seeds and real BGL system logs. Ablation studies compare the full system against variants without cross-encoder re-ranking, without routing gates, with LRU eviction, and an oracle with unbounded memory. Under saturation, the full model preserves active-query precision (MRR = 1.000) while enabling controlled forgetting of stale history, approaching oracle active performance (0.997 +/- 0.003). In contrast, LRU minimizes latency (21.1 ms) but permanently evicts 2416 essential facts. On BGL logs, the full system achieves MRR 0.336, close to the oracle (0.370), while LRU drops to 0.069. Code is publicly available at: https://github.com/shubham-61291/HTM-EAR
Paper Structure (11 sections, 2 equations, 4 figures, 3 tables)

This paper contains 11 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: HTM-EAR architecture illustrating tiered ANN retrieval, importance-aware eviction, and hybrid routing under memory saturation.
  • Figure 2: Precision decay under saturation (Scenario B).
  • Figure 3: Pareto frontier between active retrieval quality and latency. The failure zone indicates MRR below 0.6.
  • Figure 4: Performance comparison on BGL log benchmark.