Table of Contents
Fetching ...

Memp: Exploring Agent Procedural Memory

Runnan Fang, Yuan Liang, Xiaobin Wang, Jialong Wu, Shuofei Qiao, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang

TL;DR

Addressing brittle procedural memory in LLM-based agents, the paper introduces Mem^p, a framework to create, retrieve, and update a lifelong procedural memory from past trajectories. It distills experiences into fine-grained instructions and high-level scripts, evaluated on TravelPlanner and ALFWorld with GPT-4o, Claude, and Qwen backbones, showing improved accuracy and efficiency as memory banks grow. The study demonstrates that procedural memory learned by strong models can transfer to weaker models and that increasing retrieved memories enhances performance up to a plateau. These results highlight the potential of lifelong memory ecosystems for robust, scalable agents in long-horizon tasks.

Abstract

Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters. In this work, we investigate strategies to endow agents with a learnable, updatable, and lifelong procedural memory. We propose Memp that distills past agent trajectories into both fine-grained, step-by-step instructions and higher-level, script-like abstractions, and explore the impact of different strategies for Build, Retrieval, and Update of procedural memory. Coupled with a dynamic regimen that continuously updates, corrects, and deprecates its contents, this repository evolves in lockstep with new experience. Empirical evaluation on TravelPlanner and ALFWorld shows that as the memory repository is refined, agents achieve steadily higher success rates and greater efficiency on analogous tasks. Moreover, procedural memory built from a stronger model retains its value: migrating the procedural memory to a weaker model yields substantial performance gains.

Memp: Exploring Agent Procedural Memory

TL;DR

Addressing brittle procedural memory in LLM-based agents, the paper introduces Mem^p, a framework to create, retrieve, and update a lifelong procedural memory from past trajectories. It distills experiences into fine-grained instructions and high-level scripts, evaluated on TravelPlanner and ALFWorld with GPT-4o, Claude, and Qwen backbones, showing improved accuracy and efficiency as memory banks grow. The study demonstrates that procedural memory learned by strong models can transfer to weaker models and that increasing retrieved memories enhances performance up to a plateau. These results highlight the potential of lifelong memory ecosystems for robust, scalable agents in long-horizon tasks.

Abstract

Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters. In this work, we investigate strategies to endow agents with a learnable, updatable, and lifelong procedural memory. We propose Memp that distills past agent trajectories into both fine-grained, step-by-step instructions and higher-level, script-like abstractions, and explore the impact of different strategies for Build, Retrieval, and Update of procedural memory. Coupled with a dynamic regimen that continuously updates, corrects, and deprecates its contents, this repository evolves in lockstep with new experience. Empirical evaluation on TravelPlanner and ALFWorld shows that as the memory repository is refined, agents achieve steadily higher success rates and greater efficiency on analogous tasks. Moreover, procedural memory built from a stronger model retains its value: migrating the procedural memory to a weaker model yields substantial performance gains.

Paper Structure

This paper contains 18 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: With procedural memory, agents can improve both the success rate (accuracy ↑) and execution efficiency (steps ↓) when solving similar tasks.
  • Figure 2: The procedural memory framework consists of Build, Retrieve, and Update, which respectively involve encoding stored procedural memory, forming new procedural memories, and modifying existing ones in light of new experiences.
  • Figure 3: Reward gain and steps reduction vs. trajectory group index with procedural memory.
  • Figure 4: (a) Transfer result of GPT-4o's procedural memory to Qwen2.5-14B-Instruct and its performance on TravelPlanner dataset.(b) The relationship between the quantity of procedural memory retrieved for GPT-4o's performance on the ALFWorld dataset.
  • Figure 5: Compare trajectories with and without procedural memory, shortens the process by 9 steps and saves 685 tokens.