Table of Contents
Fetching ...

ByteRover: Agent-Native Memory Through LLM-Curated Hierarchical Context

Andy Nguyen, Danh Doan, Hoang Pham, Bao Ha, Dat Pham, Linh Nguyen, Hieu Nguyen, Thien Nguyen, Cuong Do, Phat Nguyen, Toan Nguyen

Abstract

Memory-Augmented Generation (MAG) extends large language models with external memory to support long-context reasoning, but existing approaches universally treat memory as an external service that agents call into, delegating storage to separate pipelines of chunking, embedding, and graph extraction. This architectural separation means the system that stores knowledge does not understand it, leading to semantic drift between what the agent intended to remember and what the pipeline actually captured, loss of coordination context across agents, and fragile recovery after failures. In this paper, we propose ByteRover, an agent-native memory architecture that inverts the memory pipeline: the same LLM that reasons about a task also curates, structures, and retrieves knowledge. ByteRover represents knowledge in a hierarchical Context Tree, a file-based knowledge graph organized as Domain, Topic, Subtopic, and Entry, where each entry carries explicit relations, provenance, and an Adaptive Knowledge Lifecycle (AKL) with importance scoring, maturity tiers, and recency decay. Retrieval uses a 5-tier progressive strategy that resolves most queries at sub-100 ms latency without LLM calls, escalating to agentic reasoning only for novel questions. Experiments on LoCoMo and LongMemEval demonstrate that ByteRover achieves state-of-the-art accuracy on LoCoMo and competitive results on LongMemEval while requiring zero external infrastructure, no vector database, no graph database, no embedding service, with all knowledge stored as human-readable markdown files on the local filesystem.

ByteRover: Agent-Native Memory Through LLM-Curated Hierarchical Context

Abstract

Memory-Augmented Generation (MAG) extends large language models with external memory to support long-context reasoning, but existing approaches universally treat memory as an external service that agents call into, delegating storage to separate pipelines of chunking, embedding, and graph extraction. This architectural separation means the system that stores knowledge does not understand it, leading to semantic drift between what the agent intended to remember and what the pipeline actually captured, loss of coordination context across agents, and fragile recovery after failures. In this paper, we propose ByteRover, an agent-native memory architecture that inverts the memory pipeline: the same LLM that reasons about a task also curates, structures, and retrieves knowledge. ByteRover represents knowledge in a hierarchical Context Tree, a file-based knowledge graph organized as Domain, Topic, Subtopic, and Entry, where each entry carries explicit relations, provenance, and an Adaptive Knowledge Lifecycle (AKL) with importance scoring, maturity tiers, and recency decay. Retrieval uses a 5-tier progressive strategy that resolves most queries at sub-100 ms latency without LLM calls, escalating to agentic reasoning only for novel questions. Experiments on LoCoMo and LongMemEval demonstrate that ByteRover achieves state-of-the-art accuracy on LoCoMo and competitive results on LongMemEval while requiring zero external infrastructure, no vector database, no graph database, no embedding service, with all knowledge stored as human-readable markdown files on the local filesystem.

Paper Structure

This paper contains 42 sections, 5 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: Architectural overview of ByteRover. Clients (TUI, CLI, MCP) connect via Socket.IO to a daemon that manages a per-project task queue and agent pool. Each agent process contains three logical layers: (1) an Agent Layer where curate and search_knowledge are first-class tools in the LLM's reasoning loop; (2) an Execution Layer with a query executor for 5-tier progressive retrieval and a sandboxed curation environment; and (3) a Knowledge Layer with the Context Tree, BM25 full-text index, and query cache, all backed by the local filesystem with no external infrastructure.
  • Figure 2: The 5-tier progressive retrieval pipeline. Search is initiated in parallel with fingerprint computation. Tiers 0--1 resolve from cache without awaiting search. Tier 2 serves high-confidence results directly from MiniSearch. Only novel or ambiguous queries escalate to Tier 3 (single optimized LLM call with pre-fetched context) or Tier 4 (full agentic loop with tool access). Approximate latencies shown on right.
  • Figure 3: A complete knowledge entry in the Context Tree, showing the YAML frontmatter with lifecycle metadata, explicit relation annotations, raw concept (provenance), and narrative (interpreted structure).