Towards Autonomous Memory Agents

Xinle Wu; Rui Zhang; Mustafa Anis Hussain; Yao Lu

Towards Autonomous Memory Agents

Xinle Wu, Rui Zhang, Mustafa Anis Hussain, Yao Lu

TL;DR

U-Mem proposes autonomous memory agents that actively acquire, validate, and curate knowledge at a minimum cost, and can surpass RL-based optimization on both verifiable and non-verifiable benchmarks.

Abstract

Recent memory agents improve LLMs by extracting experiences and conversation history into an external storage. This enables low-overhead context assembly and online memory update without expensive LLM training. However, existing solutions remain passive and reactive; memory growth is bounded by information that happens to be available, while memory agents seldom seek external inputs in uncertainties. We propose autonomous memory agents that actively acquire, validate, and curate knowledge at a minimum cost. U-Mem materializes this idea via (i) a cost-aware knowledge-extraction cascade that escalates from cheap self/teacher signals to tool-verified research and, only when needed, expert feedback, and (ii) semantic-aware Thompson sampling to balance exploration and exploitation over memories and mitigate cold-start bias. On both verifiable and non-verifiable benchmarks, U-Mem consistently beats prior memory baselines and can surpass RL-based optimization, improving HotpotQA (Qwen2.5-7B) by 14.6 points and AIME25 (Gemini-2.5-flash) by 7.33 points.

Towards Autonomous Memory Agents

TL;DR

Abstract

Paper Structure (18 sections, 7 equations, 12 figures, 5 tables)

This paper contains 18 sections, 7 equations, 12 figures, 5 tables.

Introduction
Related Work
Methods
Problem Formulation
Overview of U-Mem
Memory for Verifiable Tasks
Memory for Non-Verifiable Tasks
Experiments
Experimental Setup
Main Results
Ablation Studies
More Analysis
Case Study
Conclusion
Reproducibility Statement
...and 3 more sections

Figures (12)

Figure 1: Passive vs. autonomous memory agents.
Figure 2: Overview of U-Mem.
Figure 3: Scaling trend of U-Mem on HotpotQA.
Figure 4: Correlation between task similarity and memory benefit. Each point denotes a task; the x-axis is AMCS (task similarity), and the y-axis is U-Mem’s performance gain over the base model. Pearson correlation: r = 0.888..
Figure 5: Comparison of average tokens usage between U-MEM, ReasoningBank, ReMe, MemRL and No Mem.
...and 7 more figures

Towards Autonomous Memory Agents

TL;DR

Abstract

Towards Autonomous Memory Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (12)