Table of Contents
Fetching ...

Learning to Continually Learn via Meta-learning Agentic Memory Designs

Yiming Xiong, Shengran Hu, Jeff Clune

TL;DR

The stateless nature of foundation models constrains long-horizon continual learning in agentic systems. The paper introduces ALMA, a framework where a Meta Agent searches over executable memory-design code to automate the design of memory components and enable continual learning across diverse domains. It deploys a multi-layer memory architecture with an orchestrator that semantically routes information between layers, and demonstrates superior performance, transferability across stronger models, and cost efficiency across four sequential decision benchmarks. The work advances automated memory design for lifelong AI, while addressing safety and scalability considerations for real-world deployment.

Abstract

The statelessness of foundation models bottlenecks agentic systems' ability to continually learn, a core capability for long-horizon reasoning and adaptation. To address this limitation, agentic systems commonly incorporate memory modules to retain and reuse past experience, aiming for continual learning during test time. However, most existing memory designs are human-crafted and fixed, which limits their ability to adapt to the diversity and non-stationarity of real-world tasks. In this paper, we introduce ALMA (Automated meta-Learning of Memory designs for Agentic systems), a framework that meta-learns memory designs to replace hand-engineered memory designs, therefore minimizing human effort and enabling agentic systems to be continual learners across diverse domains. Our approach employs a Meta Agent that searches over memory designs expressed as executable code in an open-ended manner, theoretically allowing the discovery of arbitrary memory designs, including database schemas as well as their retrieval and update mechanisms. Extensive experiments across four sequential decision-making domains demonstrate that the learned memory designs enable more effective and efficient learning from experience than state-of-the-art human-crafted memory designs on all benchmarks. When developed and deployed safely, ALMA represents a step toward self-improving AI systems that learn to be adaptive, continual learners.

Learning to Continually Learn via Meta-learning Agentic Memory Designs

TL;DR

The stateless nature of foundation models constrains long-horizon continual learning in agentic systems. The paper introduces ALMA, a framework where a Meta Agent searches over executable memory-design code to automate the design of memory components and enable continual learning across diverse domains. It deploys a multi-layer memory architecture with an orchestrator that semantically routes information between layers, and demonstrates superior performance, transferability across stronger models, and cost efficiency across four sequential decision benchmarks. The work advances automated memory design for lifelong AI, while addressing safety and scalability considerations for real-world deployment.

Abstract

The statelessness of foundation models bottlenecks agentic systems' ability to continually learn, a core capability for long-horizon reasoning and adaptation. To address this limitation, agentic systems commonly incorporate memory modules to retain and reuse past experience, aiming for continual learning during test time. However, most existing memory designs are human-crafted and fixed, which limits their ability to adapt to the diversity and non-stationarity of real-world tasks. In this paper, we introduce ALMA (Automated meta-Learning of Memory designs for Agentic systems), a framework that meta-learns memory designs to replace hand-engineered memory designs, therefore minimizing human effort and enabling agentic systems to be continual learners across diverse domains. Our approach employs a Meta Agent that searches over memory designs expressed as executable code in an open-ended manner, theoretically allowing the discovery of arbitrary memory designs, including database schemas as well as their retrieval and update mechanisms. Extensive experiments across four sequential decision-making domains demonstrate that the learned memory designs enable more effective and efficient learning from experience than state-of-the-art human-crafted memory designs on all benchmarks. When developed and deployed safely, ALMA represents a step toward self-improving AI systems that learn to be adaptive, continual learners.
Paper Structure (37 sections, 12 equations, 14 figures, 2 tables, 1 algorithm)

This paper contains 37 sections, 12 equations, 14 figures, 2 tables, 1 algorithm.

Figures (14)

  • Figure 1: Open-ended Exploration Process of ALMA. The Meta Agent first ideates and proposes a plan by reflecting on the code and evaluation logs of the sampled memory design. It then implements the plan by programming the new design in code. Finally, it verifies the correctness of the new memory design and evaluates it with an agentic system. The evaluated memory design is subsequently added to the memory design archive for future sampling.
  • Figure 2: The learning process of ALMA on Baba Is AI, using GPT-5-nano as the FM in an agentic system. The learning processes of other benchmarks are shown in Appendix \ref{['app:meta_learn_results']}. Left: The memory design archive tree, where each node represents a memory design produced during the open-ended exploration for ever-better memory designs. Node colors indicate the success rate, and edges indicate that each child node is derived from its parent. The memory design with the highest success rate is used as the final learned memory design. Right: The step-wise learning progress. ALMA progressively discovers memory designs by building on an ever-growing archive of previous discoveries. The path from the root memory design to the best memory design highlights the importance of open-ended exploration, where designs with moderate success rates serve as stepping stones toward optimal solutions.
  • Figure 3: The visualization of the best-learned memory designs across different benchmarks. Each sub-module in memory designs may have a dedicated database or none, depending on its function, and arrows show the retrieval and update workflows in memory designs. The name and explanation of each sub-module are generated by Meta Agent and manually summarized, respectively. Example code and output for a learned memory design are provided.
  • Figure 4: Success rates of different memory designs in ALFWorld as the task size during the memory collection phase increases. Evaluations are performed using static mode during the Deployment Phase to study how performance scales with collected static memory. GPT-5-mini is used as the FM in the agentic system during testing. Shaded areas indicate standard error, calculated over three runs of the Deployment Phase. The learned memory design achieves higher performance faster with limited data and scales better than human-designed baselines.
  • Figure 5: Success rates of different memory designs in ALFWorld under task distribution shift in the Deployment Phase. Memory is collected from tasks in valid_seen during the Memory Collection Phase and evaluated on the valid_unseen dataset with dynamic mode during the Deployment Phase, using GPT-5-mini as the FM in the agentic system. The error bars indicate the standard errors calculated over three runs of the Deployment Phase. The learned memory design adapts more effectively than human-designed baselines under task distribution shift.
  • ...and 9 more figures