Table of Contents
Fetching ...

MemEvolve: Meta-Evolution of Agent Memory Systems

Guibin Zhang, Haotian Ren, Chong Zhan, Zhenhong Zhou, Junhao Wang, He Zhu, Wangchunshu Zhou, Shuicheng Yan

TL;DR

MemEvolve introduces a meta-evolutionary framework that jointly evolves agent experiences and their memory architectures, addressing the staticity of traditional memory systems. Grounded by the EvolveLab modular codebase, MemEvolve employs a dual-evolution process with diagnose-and-design refinement to generate memory systems that generalize across tasks, frameworks, and backbones. Empirical results across GAIA, WebWalkerQA, xBench-DS, and TaskCraft show substantial performance gains (up to 17.06%) and robust cross-domain generalization, while maintaining comparable costs. The work provides a standardized platform and actionable design principles for future self-improving agents, highlighting the value of adaptive, hierarchical, and multi-level memory abstractions.

Abstract

Self-evolving memory systems are unprecedentedly reshaping the evolutionary paradigm of large language model (LLM)-based agents. Prior work has predominantly relied on manually engineered memory architectures to store trajectories, distill experience, and synthesize reusable tools, enabling agents to evolve on the fly within environment interactions. However, this paradigm is fundamentally constrained by the staticity of the memory system itself: while memory facilitates agent-level evolving, the underlying memory architecture cannot be meta-adapted to diverse task contexts. To address this gap, we propose MemEvolve, a meta-evolutionary framework that jointly evolves agents' experiential knowledge and their memory architecture, allowing agent systems not only to accumulate experience but also to progressively refine how they learn from it. To ground MemEvolve in prior research and foster openness in future self-evolving systems, we introduce EvolveLab, a unified self-evolving memory codebase that distills twelve representative memory systems into a modular design space (encode, store, retrieve, manage), providing both a standardized implementation substrate and a fair experimental arena. Extensive evaluations on four challenging agentic benchmarks demonstrate that MemEvolve achieves (I) substantial performance gains, improving frameworks such as SmolAgent and Flash-Searcher by up to $17.06\%$; and (II) strong cross-task and cross-LLM generalization, designing memory architectures that transfer effectively across diverse benchmarks and backbone models.

MemEvolve: Meta-Evolution of Agent Memory Systems

TL;DR

MemEvolve introduces a meta-evolutionary framework that jointly evolves agent experiences and their memory architectures, addressing the staticity of traditional memory systems. Grounded by the EvolveLab modular codebase, MemEvolve employs a dual-evolution process with diagnose-and-design refinement to generate memory systems that generalize across tasks, frameworks, and backbones. Empirical results across GAIA, WebWalkerQA, xBench-DS, and TaskCraft show substantial performance gains (up to 17.06%) and robust cross-domain generalization, while maintaining comparable costs. The work provides a standardized platform and actionable design principles for future self-improving agents, highlighting the value of adaptive, hierarchical, and multi-level memory abstractions.

Abstract

Self-evolving memory systems are unprecedentedly reshaping the evolutionary paradigm of large language model (LLM)-based agents. Prior work has predominantly relied on manually engineered memory architectures to store trajectories, distill experience, and synthesize reusable tools, enabling agents to evolve on the fly within environment interactions. However, this paradigm is fundamentally constrained by the staticity of the memory system itself: while memory facilitates agent-level evolving, the underlying memory architecture cannot be meta-adapted to diverse task contexts. To address this gap, we propose MemEvolve, a meta-evolutionary framework that jointly evolves agents' experiential knowledge and their memory architecture, allowing agent systems not only to accumulate experience but also to progressively refine how they learn from it. To ground MemEvolve in prior research and foster openness in future self-evolving systems, we introduce EvolveLab, a unified self-evolving memory codebase that distills twelve representative memory systems into a modular design space (encode, store, retrieve, manage), providing both a standardized implementation substrate and a fair experimental arena. Extensive evaluations on four challenging agentic benchmarks demonstrate that MemEvolve achieves (I) substantial performance gains, improving frameworks such as SmolAgent and Flash-Searcher by up to ; and (II) strong cross-task and cross-LLM generalization, designing memory architectures that transfer effectively across diverse benchmarks and backbone models.

Paper Structure

This paper contains 23 sections, 7 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: The comparison between MemEvolve and several popular self-evolving agent memory systems across benchmarks. The underlying framework is Flash-Searcher qin2025flashsearcherfasteffectiveweb+GPT-5-Mini.
  • Figure 2: The paradigm of agent self-evolution admits a natural analogy to human learning. At one extreme, a mediocre learner fails to benefit from experience (agents without memory). More capable skillful learners can extract reusable skills from past experience, albeit through a fixed and pre-defined abstraction scheme. In contrast, an adaptive learner simultaneously accumulates experience and dynamically adjusts the strategy by which experience is consolidated and utilized. This final regime precisely characterizes the objective of MemEvolve.
  • Figure 3: The overview of our proposed MemEvolve.
  • Figure 4: The cross-framework generalization analysis. We transfer the memory system evolved on TaskCraft+to and . Red percentages denote the relative score gains of each framework after integrating MemEvolve over its memory-free counterpart.
  • Figure 5: Evolution of cumulative accuracy across question indices. Cumulative accuracy at index $i$ is defined as the average accuracy over the first $i$ questions. The curves exhibit larger fluctuations at early indices due to limited sample size, and gradually stabilize as more questions are accumulated.
  • ...and 5 more figures