Table of Contents
Fetching ...

Toward a Theory of Hierarchical Memory for Language Agents

Yashar Talebirad, Ali Parsaee, Csongor Y. Szepesvari, Amirhossein Nadiri, Osmar Zaiane

Abstract

Many recent long-context and agentic systems address context-length limitations by adding hierarchical memory: they extract atomic units from raw data, build multi-level representatives by grouping and compression, and traverse this structure to retrieve content under a token budget. Despite recurring implementations, there is no shared formalism for comparing design choices. We propose a unifying theory in terms of three operators. Extraction ($α$) maps raw data to atomic information units; coarsening ($C = (π, ρ)$) partitions units and assigns a representative to each group; and traversal ($τ$) selects which units to include in context given a query and budget. We identify a self-sufficiency spectrum for the representative function $ρ$ and show how it constrains viable retrieval strategies (a coarsening-traversal coupling). Finally, we instantiate the decomposition on eleven existing systems spanning document hierarchies, conversational memory, and agent execution traces, showcasing its generality.

Toward a Theory of Hierarchical Memory for Language Agents

Abstract

Many recent long-context and agentic systems address context-length limitations by adding hierarchical memory: they extract atomic units from raw data, build multi-level representatives by grouping and compression, and traverse this structure to retrieve content under a token budget. Despite recurring implementations, there is no shared formalism for comparing design choices. We propose a unifying theory in terms of three operators. Extraction () maps raw data to atomic information units; coarsening () partitions units and assigns a representative to each group; and traversal () selects which units to include in context given a query and budget. We identify a self-sufficiency spectrum for the representative function and show how it constrains viable retrieval strategies (a coarsening-traversal coupling). Finally, we instantiate the decomposition on eleven existing systems spanning document hierarchies, conversational memory, and agent execution traces, showcasing its generality.
Paper Structure (23 sections, 6 equations, 1 figure, 1 table)

This paper contains 23 sections, 6 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: The $(\alpha, C, \tau)$ pipeline: $D$ is extracted ($\alpha$) into atoms $G_0$; coarsening $C_1, C_2, C_3$ yields layers $G_1, G_2, G_3$; traversal $\tau$ takes a query and budget and returns a subset $S$ of atoms.

Theorems & Definitions (2)

  • Definition 1: Affinity
  • Definition 2: $W$-coherent partition