Table of Contents
Fetching ...

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Zhenting Wang, Huancheng Chen, Jiayun Wang, Wei Wei

TL;DR

This work introduces Memex, an indexed experience memory mechanism that instead compresses context without discarding evidence, and provides a theoretical analysis showing the potential of the Memex loop to preserve decision quality with bounded dereferencing while keeping effective in-context computation bounded as history grows.

Abstract

Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the context budget, and makes distant evidence harder to use even when it is still present. Existing solutions typically shorten context through truncation or running summaries, but these methods are fundamentally lossy because they compress or discard past evidence itself. We introduce Memex, an indexed experience memory mechanism that instead compresses context without discarding evidence. Memex maintains a compact working context consisting of concise structured summaries and stable indices, while storing full-fidelity underlying interactions in an external experience database under those indices. The agent can then decide when to dereference an index and recover the exact past evidence needed for the current subgoal. We optimize both write and read behaviors with our reinforcement learning framework MemexRL, using reward shaping tailored to indexed memory usage under a context budget, so the agent learns what to summarize, what to archive, how to index it, and when to retrieve it. This yields a substantially less lossy form of long-horizon memory than summary-only approaches. We further provide a theoretical analysis showing the potential of the Memex loop to preserve decision quality with bounded dereferencing while keeping effective in-context computation bounded as history grows. Empirically, on challenging long-horizon tasks, Memex agent trained with MemexRL improves task success while using a significantly smaller working context.

Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

TL;DR

This work introduces Memex, an indexed experience memory mechanism that instead compresses context without discarding evidence, and provides a theoretical analysis showing the potential of the Memex loop to preserve decision quality with bounded dereferencing while keeping effective in-context computation bounded as history grows.

Abstract

Large language model (LLM) agents are fundamentally bottlenecked by finite context windows on long-horizon tasks. As trajectories grow, retaining tool outputs and intermediate reasoning in-context quickly becomes infeasible: the working context becomes prohibitively long, eventually exceeds the context budget, and makes distant evidence harder to use even when it is still present. Existing solutions typically shorten context through truncation or running summaries, but these methods are fundamentally lossy because they compress or discard past evidence itself. We introduce Memex, an indexed experience memory mechanism that instead compresses context without discarding evidence. Memex maintains a compact working context consisting of concise structured summaries and stable indices, while storing full-fidelity underlying interactions in an external experience database under those indices. The agent can then decide when to dereference an index and recover the exact past evidence needed for the current subgoal. We optimize both write and read behaviors with our reinforcement learning framework MemexRL, using reward shaping tailored to indexed memory usage under a context budget, so the agent learns what to summarize, what to archive, how to index it, and when to retrieve it. This yields a substantially less lossy form of long-horizon memory than summary-only approaches. We further provide a theoretical analysis showing the potential of the Memex loop to preserve decision quality with bounded dereferencing while keeping effective in-context computation bounded as history grows. Empirically, on challenging long-horizon tasks, Memex agent trained with MemexRL improves task success while using a significantly smaller working context.
Paper Structure (23 sections, 2 theorems, 5 equations, 5 figures, 1 algorithm)

This paper contains 23 sections, 2 theorems, 5 equations, 5 figures, 1 algorithm.

Key Result

Proposition 1

Let $J(\pi)$ denote the expected return of policy $\pi$. Assume that $\sigma_t$ is $B$-bounded decision-sufficient for every step $t$. Then there exists a Memex policy $\pi_{\mathrm{IEM}}$ that conditions only on $\sigma_t$ and uses at most $B$ calls to ReadExperience$(\cdot)$ per step such that $J(

Figures (5)

  • Figure 1: Memex agent loop overview. CompressExperience replaces a long tool-use trajectory in the context with a compact indexed summary, while storing detailed contents in an external key–value store. Later, ReadExperience(index) dereferences an index to retrieve the exact content and re-inject it into the context, enabling long-horizon execution under a small context window.
  • Figure 2: Task success rates for rollouts during training. The agent's task success rate improves from approximately 20% to over 90%, demonstrating that MemexRL training effectively teaches the model to solve tasks using Memex agent loop.
  • Figure 3: Total penalty for rollouts during training. The penalty decreases in magnitude from $-0.4$ to approximately $-0.1$, showing that the agent learns better task execution and strategically squeezing the peak working context length using CompressExperience and ReadExperience.
  • Figure 4: Effectiveness of MemexRL. (a) Task success rate improves from 24.2% to 85.6%. (b) Peak working context length reduces from 16,934 to 9,634 tokens, approaching the penalty threshold of 8,000 tokens.
  • Figure 5: Memory tool usage on the evaluation set during training. (a) Compress count decreases from 6.5 to 3 as the agent completes tasks more efficiently. (b) Retrieve count increases from 1 to 6--7, showing that RL reinforces retrieval behavior rather than suppressing it.

Theorems & Definitions (7)

  • Definition 1: Indexed Summary
  • Definition 2: Indexed Experience Memory
  • Definition 3: Decision-sufficient indexed summary
  • Proposition 1: Memex can match a full-context optimal policy
  • Proposition 2: Memex keeps working context bounded
  • proof
  • proof