Table of Contents
Fetching ...

Generalizable Self-Evolving Memory for Automatic Prompt Optimization

Guanbao Liang, Yuanchen Bei, Sheng Zhou, Yuheng Qin, Huan Zhou, Bingxin Jia, Bin Li, Jiajun Bu

Abstract

Automatic prompt optimization is a promising approach for adapting large language models (LLMs) to downstream tasks, yet existing methods typically search for a specific prompt specialized to a fixed task. This paradigm limits generalization across heterogeneous queries and prevents models from accumulating reusable prompting knowledge over time. In this paper, we propose MemAPO, a memory-driven framework that reconceptualizes prompt optimization as generalizable and self-evolving experience accumulation. MemAPO maintains a dual-memory mechanism that distills successful reasoning trajectories into reusable strategy templates while organizing incorrect generations into structured error patterns that capture recurrent failure modes. Given a new prompt, the framework retrieves both relevant strategies and failure patterns to compose prompts that promote effective reasoning while discouraging known mistakes. Through iterative self-reflection and memory editing, MemAPO continuously updates its memory, enabling prompt optimization to improve over time rather than restarting from scratch for each task. Experiments on diverse benchmarks show that MemAPO consistently outperforms representative prompt optimization baselines while substantially reducing optimization cost.

Generalizable Self-Evolving Memory for Automatic Prompt Optimization

Abstract

Automatic prompt optimization is a promising approach for adapting large language models (LLMs) to downstream tasks, yet existing methods typically search for a specific prompt specialized to a fixed task. This paradigm limits generalization across heterogeneous queries and prevents models from accumulating reusable prompting knowledge over time. In this paper, we propose MemAPO, a memory-driven framework that reconceptualizes prompt optimization as generalizable and self-evolving experience accumulation. MemAPO maintains a dual-memory mechanism that distills successful reasoning trajectories into reusable strategy templates while organizing incorrect generations into structured error patterns that capture recurrent failure modes. Given a new prompt, the framework retrieves both relevant strategies and failure patterns to compose prompts that promote effective reasoning while discouraging known mistakes. Through iterative self-reflection and memory editing, MemAPO continuously updates its memory, enabling prompt optimization to improve over time rather than restarting from scratch for each task. Experiments on diverse benchmarks show that MemAPO consistently outperforms representative prompt optimization baselines while substantially reducing optimization cost.
Paper Structure (55 sections, 8 equations, 11 figures, 7 tables)

This paper contains 55 sections, 8 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Comparison between MemAPO and existing works. (a) Existing paradigm: an external LLM iteratively refines prompts for one specific task. (b) MemAPO: self-organizes reusable memories across multiple tasks.
  • Figure 2: Comparison of performance and optimization costs across six automatic prompt optimization methods, six tasks, and two backbones. MemAPO achieves the best average performance across all datasets on both backbones, while notably reducing cost by approximately 57.2% compared to the strong baseline TextGrad.
  • Figure 3: Overall illustration of MemAPO. (I) Memory Retrieval: it first retrieves relevant strategies and failure patterns according to the given query. (II) Prompt Evaluation: Then, the augmented prompt is constructed and iteratively evaluated. (III) Memory Update: Newly acquired experiecnes are consolidated into memory.
  • Figure 4: Ablation study on the impact of template number with GPT-4o-mini.
  • Figure 5: Meta prompt for answering query.
  • ...and 6 more figures