Table of Contents
Fetching ...

From Lossy to Verified: A Provenance-Aware Tiered Memory for Agents

Qiming Zhu, Shunian Chen, Rui Yu, Zhehao Wu, Benyou Wang

TL;DR

TierMem is a provenance-linked framework that casts retrieval as an inference-time evidence allocation problem that uses a two-tier memory hierarchy to answer with the cheapest sufficient evidence: it queries a fast summary index by default, and Escalates to an immutable raw-log store only when summary evidence is insufficient.

Abstract

Long-horizon agents often compress interaction histories into write-time summaries. This creates a fundamental write-before-query barrier: compression decisions are made before the system knows what a future query will hinge on. As a result, summaries can cause unverifiable omissions -- decisive constraints (e.g., allergies) may be dropped, leaving the agent unable to justify an answer with traceable evidence. Retaining raw logs restores an authoritative source of truth, but grounding on raw logs by default is expensive: many queries are answerable from summaries, yet raw grounding still requires processing far longer contexts, inflating token consumption and latency. We propose TierMem, a provenance-linked framework that casts retrieval as an inference-time evidence allocation problem. TierMem uses a two-tier memory hierarchy to answer with the cheapest sufficient evidence: it queries a fast summary index by default, and a runtime sufficiency router Escalates to an immutable raw-log store only when summary evidence is insufficient. TierMem then writes back verified findings as new summary units linked to their raw sources. On LoCoMo, TierMem achieves 0.851 accuracy (vs.0.873 raw-only) while reducing input tokens by 54.1\% and latency by 60.7%.

From Lossy to Verified: A Provenance-Aware Tiered Memory for Agents

TL;DR

TierMem is a provenance-linked framework that casts retrieval as an inference-time evidence allocation problem that uses a two-tier memory hierarchy to answer with the cheapest sufficient evidence: it queries a fast summary index by default, and Escalates to an immutable raw-log store only when summary evidence is insufficient.

Abstract

Long-horizon agents often compress interaction histories into write-time summaries. This creates a fundamental write-before-query barrier: compression decisions are made before the system knows what a future query will hinge on. As a result, summaries can cause unverifiable omissions -- decisive constraints (e.g., allergies) may be dropped, leaving the agent unable to justify an answer with traceable evidence. Retaining raw logs restores an authoritative source of truth, but grounding on raw logs by default is expensive: many queries are answerable from summaries, yet raw grounding still requires processing far longer contexts, inflating token consumption and latency. We propose TierMem, a provenance-linked framework that casts retrieval as an inference-time evidence allocation problem. TierMem uses a two-tier memory hierarchy to answer with the cheapest sufficient evidence: it queries a fast summary index by default, and a runtime sufficiency router Escalates to an immutable raw-log store only when summary evidence is insufficient. TierMem then writes back verified findings as new summary units linked to their raw sources. On LoCoMo, TierMem achieves 0.851 accuracy (vs.0.873 raw-only) while reducing input tokens by 54.1\% and latency by 60.7%.
Paper Structure (127 sections, 7 equations, 10 figures, 15 tables, 1 algorithm)

This paper contains 127 sections, 7 equations, 10 figures, 15 tables, 1 algorithm.

Figures (10)

  • Figure 1: TierMem overview. TierMem maintains a provenance-linked two-tier memory hierarchy. Tier-1 is a fast summary index whose entries store (i) compact summaries and (ii) provenance links $\rho$ to supporting Tier-2 raw pages in an immutable paged log. Given a query, TierMem retrieves Tier-1 evidence and uses a lightweight router $\pi_\theta$ to choose Answer (summaries only) or Escalate (consult linked raw pages, then run a bounded retrieval procedure if needed). After escalation, TierMem can optionally perform verified write-back to update Tier-1 with evidence-backed findings linked to their raw sources.
  • Figure 2: The judge prompt used to evaluate whether the retrieved summaries provide sufficient context to answer the ground truth.
  • Figure 3: The updated router prompt incorporating explicit chain-of-thought reasoning to improve decision reliability.
  • Figure 4: The prompt used for extracting atomic facts from conversation streams.
  • Figure 5: Prompt for integrating retrieved facts to answer a query.
  • ...and 5 more figures