Table of Contents
Fetching ...

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, Jun Wang

TL;DR

Memento introduces a memory-based, gradient-free framework for adapting LLM agents through a memory-augmented MDP and case-based reasoning. By maintaining an episodic Case Bank and employing both non-parametric retrieval and a parametric Q-function within a planner–executor architecture, the approach enables continual learning without modifying LLM weights. Across GAIA, DeepResearcher, SimpleQA, and HLE, Memento achieves state-of-the-art or near-state-of-the-art results and shows notable OOD generalization, underscoring the value of episodic memory and memory rewriting for real-time skill acquisition. The work highlights the practicality of memory-based continual learning for open-ended deep research tasks and motivates further exploration of memory-driven LLM agent systems.

Abstract

In this paper, we introduce a novel learning paradigm for Adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptation via memory-based online reinforcement learning. We formalise this as a Memory-augmented Markov Decision Process (M-MDP), equipped with a neural case-selection policy to guide action decisions. Past experiences are stored in an episodic memory, either differentiable or non-parametric. The policy is continually updated based on environmental feedback through a memory rewriting mechanism, whereas policy improvement is achieved through efficient memory reading (retrieval). We instantiate our agent model in the deep research setting, namely \emph{Memento}, which attains top-1 on GAIA validation ($87.88\%$ Pass@$3$) and $79.40\%$ on the test set. It reaches $66.6\%$ F1 and $80.4\%$ PM on the DeepResearcher dataset, outperforming the state-of-the-art training-based method, while case-based memory adds $4.7\%$ to $9.6\%$ absolute points on out-of-distribution tasks. Our approach offers a scalable and efficient pathway for developing generalist LLM agents capable of continuous, real-time learning without gradient updates, advancing machine learning towards open-ended skill acquisition and deep research scenarios. The code is available at https://github.com/Agent-on-the-Fly/Memento.

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

TL;DR

Memento introduces a memory-based, gradient-free framework for adapting LLM agents through a memory-augmented MDP and case-based reasoning. By maintaining an episodic Case Bank and employing both non-parametric retrieval and a parametric Q-function within a planner–executor architecture, the approach enables continual learning without modifying LLM weights. Across GAIA, DeepResearcher, SimpleQA, and HLE, Memento achieves state-of-the-art or near-state-of-the-art results and shows notable OOD generalization, underscoring the value of episodic memory and memory rewriting for real-time skill acquisition. The work highlights the practicality of memory-based continual learning for open-ended deep research tasks and motivates further exploration of memory-driven LLM agent systems.

Abstract

In this paper, we introduce a novel learning paradigm for Adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptation via memory-based online reinforcement learning. We formalise this as a Memory-augmented Markov Decision Process (M-MDP), equipped with a neural case-selection policy to guide action decisions. Past experiences are stored in an episodic memory, either differentiable or non-parametric. The policy is continually updated based on environmental feedback through a memory rewriting mechanism, whereas policy improvement is achieved through efficient memory reading (retrieval). We instantiate our agent model in the deep research setting, namely \emph{Memento}, which attains top-1 on GAIA validation ( Pass@) and on the test set. It reaches F1 and PM on the DeepResearcher dataset, outperforming the state-of-the-art training-based method, while case-based memory adds to absolute points on out-of-distribution tasks. Our approach offers a scalable and efficient pathway for developing generalist LLM agents capable of continuous, real-time learning without gradient updates, advancing machine learning towards open-ended skill acquisition and deep research scenarios. The code is available at https://github.com/Agent-on-the-Fly/Memento.

Paper Structure

This paper contains 29 sections, 26 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of Memento evaluation across baselines, benchmarks, memory designs and generalisation.
  • Figure 2: A graphical model of memory-based Markov Decision Process.
  • Figure 3: The architecture of Memento with parametric memory. Memento is instantiated as a planner–executor framework alternating between Case‑Based Planning (Stage 1) and Tool-Based Execution (Stage 2). The planner is an LLM-based CBR agent enhanced by a Case Memory module that supports both Write, which records new cases and online refines the Q-function, and Read, which retrieves cases via the learned retrieval policy for adaptive case selection. The executor is an LLM-based MCP client that invokes external tools hosted on the MCP servers through the MCP protocol.
  • Figure 4: Performance on SimpleQA and HLE. The SimpleQA results are from WebSailor li2025websailor, and the HLE results are from the official website.
  • Figure 5: The average number of each task type per level, highlighting the dominance of code, search, and crawl tasks as difficulty level increases.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 3.1: Memory-Based Markov Decision Process
  • Definition 3.2: Case-Based Reasoning Agent