Table of Contents
Fetching ...

DeltaMem: Towards Agentic Memory Management via Reinforcement Learning

Qi Zhang, Shen Huang, Chu Liu, Shouqing Yang, Junbo Zhao, Haobo Wang, Pengjun Xie

Abstract

Recent advances in persona-centric memory have revealed the powerful capability of multi-agent systems in managing persona memory, especially in conversational scenarios. However, these complex frameworks often suffer from information loss and are fragile across varying scenarios, resulting in suboptimal performance. In this paper, we propose DeltaMem, an agentic memory management system that formulates persona-centric memory management as an end-to-end task within a single-agent setting. To further improve the performance of our agentic memory manager, we draw inspiration from the evolution of human memory and synthesize a user-assistant dialogue dataset along with corresponding operation-level memory updating labels. Building on this, we introduce a novel Memory-based Levenshtein Distance to formalize the memory updating reward, and propose a tailored reinforcement learning framework to further enhance the management capabilities of DeltaMem. Extensive experiments show that both training-free and RL-trained DeltaMem outperform all product-level baselines across diverse long-term memory benchmarks, including LoCoMo, HaluMem, and PersonaMem.

DeltaMem: Towards Agentic Memory Management via Reinforcement Learning

Abstract

Recent advances in persona-centric memory have revealed the powerful capability of multi-agent systems in managing persona memory, especially in conversational scenarios. However, these complex frameworks often suffer from information loss and are fragile across varying scenarios, resulting in suboptimal performance. In this paper, we propose DeltaMem, an agentic memory management system that formulates persona-centric memory management as an end-to-end task within a single-agent setting. To further improve the performance of our agentic memory manager, we draw inspiration from the evolution of human memory and synthesize a user-assistant dialogue dataset along with corresponding operation-level memory updating labels. Building on this, we introduce a novel Memory-based Levenshtein Distance to formalize the memory updating reward, and propose a tailored reinforcement learning framework to further enhance the management capabilities of DeltaMem. Extensive experiments show that both training-free and RL-trained DeltaMem outperform all product-level baselines across diverse long-term memory benchmarks, including LoCoMo, HaluMem, and PersonaMem.

Paper Structure

This paper contains 50 sections, 9 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: i) Framework of DeltaMem. When ingesting a new session $k$, the memory manager will interact with the current memory state and finally generate a set of operations to update the current memory state. ii) Brief illustration of memory evolving guided data synthesis. iii) The reward modeling module of reinforcement learning via state transition.
  • Figure 2: Reward dynamics during RL training of DeltaMem-4B-RL (left) and DeltaMem-8B-RL (right).
  • Figure 3: Ablation study on local lexical fidelity.
  • Figure 4: Average cumulative accuracy on HaluMem as the number of sessions increases.
  • Figure 5: Average cumulative accuracy as a function of the number of sessions for LightMem and DeltaMem models with different model scales and training settings. The solid line shows the average accuracy over 20 users, and the shaded area indicates the variability across these users.
  • ...and 1 more figures

Theorems & Definitions (1)

  • Definition 3.1