MemEvolve: Meta-Evolution of Agent Memory Systems
Guibin Zhang, Haotian Ren, Chong Zhan, Zhenhong Zhou, Junhao Wang, He Zhu, Wangchunshu Zhou, Shuicheng Yan
TL;DR
MemEvolve introduces a meta-evolutionary framework that jointly evolves agent experiences and their memory architectures, addressing the staticity of traditional memory systems. Grounded by the EvolveLab modular codebase, MemEvolve employs a dual-evolution process with diagnose-and-design refinement to generate memory systems that generalize across tasks, frameworks, and backbones. Empirical results across GAIA, WebWalkerQA, xBench-DS, and TaskCraft show substantial performance gains (up to 17.06%) and robust cross-domain generalization, while maintaining comparable costs. The work provides a standardized platform and actionable design principles for future self-improving agents, highlighting the value of adaptive, hierarchical, and multi-level memory abstractions.
Abstract
Self-evolving memory systems are unprecedentedly reshaping the evolutionary paradigm of large language model (LLM)-based agents. Prior work has predominantly relied on manually engineered memory architectures to store trajectories, distill experience, and synthesize reusable tools, enabling agents to evolve on the fly within environment interactions. However, this paradigm is fundamentally constrained by the staticity of the memory system itself: while memory facilitates agent-level evolving, the underlying memory architecture cannot be meta-adapted to diverse task contexts. To address this gap, we propose MemEvolve, a meta-evolutionary framework that jointly evolves agents' experiential knowledge and their memory architecture, allowing agent systems not only to accumulate experience but also to progressively refine how they learn from it. To ground MemEvolve in prior research and foster openness in future self-evolving systems, we introduce EvolveLab, a unified self-evolving memory codebase that distills twelve representative memory systems into a modular design space (encode, store, retrieve, manage), providing both a standardized implementation substrate and a fair experimental arena. Extensive evaluations on four challenging agentic benchmarks demonstrate that MemEvolve achieves (I) substantial performance gains, improving frameworks such as SmolAgent and Flash-Searcher by up to $17.06\%$; and (II) strong cross-task and cross-LLM generalization, designing memory architectures that transfer effectively across diverse benchmarks and backbone models.
