Table of Contents
Fetching ...

UMEM: Unified Memory Extraction and Management Framework for Generalizable Memory

Yongshi Ye, Hui Jiang, Feihu Jiang, Tian Lan, Yichao Du, Biao Fu, Xiaodong Shi, Qianghuai Jia, Longyue Wang, Weihua Luo

TL;DR

UMEM tackles the challenge of generalizable self-evolving memories for LLM-based agents by jointly optimizing memory extraction and management. It introduces Semantic Neighborhood Modeling to enforce cross-task generalization and uses a Marginal Utility Reward with Group Relative Policy Optimization to train the Mem-Optimizer, followed by Online Memory Evolution. Empirical results across five benchmarks show substantial improvements over competitive baselines, with stronger executors and larger mem-optimizer models yielding greater gains and a monotonic growth of performance during continual interaction. The approach enables robust, executor-agnostic lifelong learning in open-ended environments and provides a scalable path toward generalizable, self-improving memory systems.

Abstract

Self-evolving memory serves as the trainable parameters for Large Language Models (LLMs)-based agents, where extraction (distilling insights from experience) and management (updating the memory bank) must be tightly coordinated. Existing methods predominately optimize memory management while treating memory extraction as a static process, resulting in poor generalization, where agents accumulate instance-specific noise rather than robust memories. To address this, we propose Unified Memory Extraction and Management (UMEM), a self-evolving agent framework that jointly optimizes a Large Language Model to simultaneous extract and manage memories. To mitigate overfitting to specific instances, we introduce Semantic Neighborhood Modeling and optimize the model with a neighborhood-level marginal utility reward via GRPO. This approach ensures memory generalizability by evaluating memory utility across clusters of semantically related queries. Extensive experiments across five benchmarks demonstrate that UMEM significantly outperforms highly competitive baselines, achieving up to a 10.67% improvement in multi-turn interactive tasks. Futhermore, UMEM maintains a monotonic growth curve during continuous evolution. Codes and models will be publicly released.

UMEM: Unified Memory Extraction and Management Framework for Generalizable Memory

TL;DR

UMEM tackles the challenge of generalizable self-evolving memories for LLM-based agents by jointly optimizing memory extraction and management. It introduces Semantic Neighborhood Modeling to enforce cross-task generalization and uses a Marginal Utility Reward with Group Relative Policy Optimization to train the Mem-Optimizer, followed by Online Memory Evolution. Empirical results across five benchmarks show substantial improvements over competitive baselines, with stronger executors and larger mem-optimizer models yielding greater gains and a monotonic growth of performance during continual interaction. The approach enables robust, executor-agnostic lifelong learning in open-ended environments and provides a scalable path toward generalizable, self-improving memory systems.

Abstract

Self-evolving memory serves as the trainable parameters for Large Language Models (LLMs)-based agents, where extraction (distilling insights from experience) and management (updating the memory bank) must be tightly coordinated. Existing methods predominately optimize memory management while treating memory extraction as a static process, resulting in poor generalization, where agents accumulate instance-specific noise rather than robust memories. To address this, we propose Unified Memory Extraction and Management (UMEM), a self-evolving agent framework that jointly optimizes a Large Language Model to simultaneous extract and manage memories. To mitigate overfitting to specific instances, we introduce Semantic Neighborhood Modeling and optimize the model with a neighborhood-level marginal utility reward via GRPO. This approach ensures memory generalizability by evaluating memory utility across clusters of semantically related queries. Extensive experiments across five benchmarks demonstrate that UMEM significantly outperforms highly competitive baselines, achieving up to a 10.67% improvement in multi-turn interactive tasks. Futhermore, UMEM maintains a monotonic growth curve during continuous evolution. Codes and models will be publicly released.
Paper Structure (22 sections, 1 theorem, 9 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 1 theorem, 9 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Lemma 3.1

Let $e(\cdot)$ be $\ell_2$-normalized embeddings, i.e., $\|e(x)\|_2=1$. For any two queries $q_1,q_2$ and any candidate key $k$,

Figures (7)

  • Figure 1: Comparison between the conventional memory pipeline and our proposed UMEM framework. Left: Vanilla methods suffer from the "Rote Memorization" trap, overfitting to instance-specific noise. Right: UMEM utilizes a learnable Mem-Optimizer to jointly optimize extraction and management. This distills generalizable principles, ensuring robust performance and avoiding noise accumulation.
  • Figure 2: Overview of UMEM. Left: Semantic Neighborhood Modeling retrieves related queries to simulate cross-task variations. Right: The Mem-Optimizer distills trajectories from the frozen Executor into memory updates, which are optimized via GRPO. The process is guided by a Marginal Utility Reward that measures performance gains across the entire neighborhood to ensure generalization.
  • Figure 3: Cumulative performance over sequential tasks on GPQA-Diamond and ALFWorld Benchmarks.
  • Figure 4: Test-Time Self-Evolution on ALFWorld.
  • Figure 5: Success Rate and Average Steps on ALFWorld benchmark across different executor models.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Lemma 3.1: Retrieval-score stability under cosine proximity
  • proof