Memento-Skills: Let Agents Design Agents

Huichi Zhou; Siyuan Guo; Anjie Liu; Zhongwei Yu; Ziqin Gong; Bowen Zhao; Zhixun Chen; Menglong Zhang; Yihang Chen; Jinsong Li; Runyu Yang; Qiangbin Liu; Xinlei Yu; Jianmin Zhou; Na Wang; Chunyang Sun; Jun Wang

Memento-Skills: Let Agents Design Agents

Huichi Zhou, Siyuan Guo, Anjie Liu, Zhongwei Yu, Ziqin Gong, Bowen Zhao, Zhixun Chen, Menglong Zhang, Yihang Chen, Jinsong Li, Runyu Yang, Qiangbin Liu, Xinlei Yu, Jianmin Zhou, Na Wang, Chunyang Sun, Jun Wang

Abstract

We introduce \emph{Memento-Skills}, a generalist, continually-learnable LLM agent system that functions as an \emph{agent-designing agent}: it autonomously constructs, adapts, and improves task-specific agents through experience. The system is built on a memory-based reinforcement learning framework with \emph{stateful prompts}, where reusable skills (stored as structured markdown files) serve as persistent, evolving memory. These skills encode both behaviour and context, enabling the agent to carry forward knowledge across interactions. Starting from simple elementary skills (like Web search and terminal operations), the agent continually improves via the \emph{Read--Write Reflective Learning} mechanism introduced in \emph{Memento~2}~\cite{wang2025memento2}. In the \emph{read} phase, a behaviour-trainable skill router selects the most relevant skill conditioned on the current stateful prompt; in the \emph{write} phase, the agent updates and expands its skill library based on new experience. This closed-loop design enables \emph{continual learning without updating LLM parameters}, as all adaptation is realised through the evolution of externalised skills and prompts. Unlike prior approaches that rely on human-designed agents, Memento-Skills enables a generalist agent to \emph{design agents end-to-end} for new tasks. Through iterative skill generation and refinement, the system progressively improves its own capabilities. Experiments on the \emph{General AI Assistants} benchmark and \emph{Humanity's Last Exam} demonstrate sustained gains, achieving 26.2\% and 116.2\% relative improvements in overall accuracy, respectively. Code is available at https://github.com/Memento-Teams/Memento-Skills.

Memento-Skills: Let Agents Design Agents

Abstract

Paper Structure (31 sections, 1 theorem, 9 equations, 12 figures)

This paper contains 31 sections, 1 theorem, 9 equations, 12 figures.

The Self-Evolving Agent Problem
S Why Frozen LLMs Need External Memory
R Stateful Reflective Decision Process
P From Zero to Self-Evolving Agent
S From Theory to Configuration
Contributions.
Read--Write Reflective Learning
S The Skill-Level Read--Write Loop
P Self-evolving Architecture
R InfoNCE Routing as a One-Step Soft Policy
Offline RL Router for Behaviour-Similar Retrieval.
Skill database and synthetic query generation.
Router score and multi-positive InfoNCE.
One-step offline $Q$-learning view.
Why InfoNCE matches "policy fitting" in one step.
...and 16 more sections

Key Result

Theorem 1.3

Under bounded rewards $|r| \leq R_{\max}$ and $\gamma < 1$, the KL-regularised soft policy iteration over the Reflected MDP converges to the optimal retrieval policy $\mu^*$.

Figures (12)

Figure 1: Overview of self-evolving results of Memento-Skills on two benchmarks. (a,b) depict the progressive improvement in task performance across reflective learning rounds on HLE and GAIA. (c,d) depict the corresponding growth of the skill memory, while organising learned skills into semantically meaningful clusters.
Figure 2: The three paradigms of LLM adaptation. Pre-training and fine-tuning update the model parameters $\theta$ and require large data and compute budgets. Deployment-time learning (this work) keeps $\theta$ frozen and instead accumulates experience in an external skill memory $\mathcal{M}$, enabling continual adaptation from live interactions at zero retraining cost.
Figure 3: Overview of the Read--Write Reflective Learning loop. Given a new task, the agent retrieves a relevant skill from the skill memory (Read), executes it through the frozen LLM (Act), and uses the resulting feedback to reflectively optimise and update the skill library (Write). The LLM parameters remain fixed throughout; all adaptation occurs in the memory.
Figure 4: The GUI of Memento-Skills.
Figure 5: The architecture of the Self-Evolving Agent based on Read-Write Reflective Learning. When a user submits a task, the agent uses a skill router to either retrieve an executable skill from its skill library or generate a new one from scratch, which it then executes to solve the problem. Following execution, the system reflects on the outcome to write back to the library, either by increasing the skill's utility score if the action was successful, or by optimising its underlying skill folders if it failed. This continuous read-write loop enables the agent to progressively expand and refine its capabilities through continual learning, entirely without updating the underlying LLM parameters.
...and 7 more figures

Theorems & Definitions (3)

Definition 1.1: Skill Memory
Definition 1.2: SRDP
Theorem 1.3: Convergence, Memento 2 wang2025memento2, Thm. 8

Memento-Skills: Let Agents Design Agents

Abstract

Memento-Skills: Let Agents Design Agents

Authors

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (3)