Table of Contents
Fetching ...

In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents

Zhen Tan, Jun Yan, I-Hung Hsu, Rujun Han, Zifeng Wang, Long T. Le, Yiwen Song, Yanfei Chen, Hamid Palangi, George Lee, Anand Iyer, Tianlong Chen, Huan Liu, Chen-Yu Lee, Tomas Pfister

TL;DR

This work tackles the challenge of long-term personalization in dialogue agents by introducing Reflective Memory Management (RMM), which combines Prospective Reflection for topic-based memory organization with Retrospective Reflection for online retrieval refinement via LLM attribution. By organizing memories into coherent topics and continually refining retrieval through a lightweight RL-trained reranker guided by LLM-derived rewards, RMM improves both memory relevance and response quality across MSC and LongMemEval benchmarks. Key contributions include a detailed framework for topic-based memory extraction, a differentiable reranker with Gumbel-based sampling, and attribution-based reward signals that enable online adaptation without extensive labeled data. The results demonstrate consistent improvements over strong baselines, with analysis on granularity, offline pretraining, and different LLMs, highlighting RMM’s potential for robust, long-term personalization in real-world dialogue systems.

Abstract

Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization. External memory mechanisms have been proposed to address this limitation, enabling LLMs to maintain conversational continuity. However, existing approaches struggle with two key challenges. First, rigid memory granularity fails to capture the natural semantic structure of conversations, leading to fragmented and incomplete representations. Second, fixed retrieval mechanisms cannot adapt to diverse dialogue contexts and user interaction patterns. In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities-utterances, turns, and sessions-into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs' cited evidence. Experiments show that RMM demonstrates consistent improvement across various metrics and benchmarks. For example, RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.

In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents

TL;DR

This work tackles the challenge of long-term personalization in dialogue agents by introducing Reflective Memory Management (RMM), which combines Prospective Reflection for topic-based memory organization with Retrospective Reflection for online retrieval refinement via LLM attribution. By organizing memories into coherent topics and continually refining retrieval through a lightweight RL-trained reranker guided by LLM-derived rewards, RMM improves both memory relevance and response quality across MSC and LongMemEval benchmarks. Key contributions include a detailed framework for topic-based memory extraction, a differentiable reranker with Gumbel-based sampling, and attribution-based reward signals that enable online adaptation without extensive labeled data. The results demonstrate consistent improvements over strong baselines, with analysis on granularity, offline pretraining, and different LLMs, highlighting RMM’s potential for robust, long-term personalization in real-world dialogue systems.

Abstract

Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information from long-term interactions limits their effectiveness in applications requiring sustained personalization. External memory mechanisms have been proposed to address this limitation, enabling LLMs to maintain conversational continuity. However, existing approaches struggle with two key challenges. First, rigid memory granularity fails to capture the natural semantic structure of conversations, leading to fragmented and incomplete representations. Second, fixed retrieval mechanisms cannot adapt to diverse dialogue contexts and user interaction patterns. In this work, we propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections: (1) Prospective Reflection, which dynamically summarizes interactions across granularities-utterances, turns, and sessions-into a personalized memory bank for effective future retrieval, and (2) Retrospective Reflection, which iteratively refines the retrieval in an online reinforcement learning (RL) manner based on LLMs' cited evidence. Experiments show that RMM demonstrates consistent improvement across various metrics and benchmarks. For example, RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.

Paper Structure

This paper contains 39 sections, 3 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: An illustration of a personalized healthcare agent. Key information about a user's allergy and previous symptoms mentioned in the past sessions is needed to provide a more informed response in the current session.
  • Figure 2: Illustration of Prospective Reflection. After each session, the agent decomposes and summarizes the session into specific topics. These newly generated memories are compared with existing memories in the memory bank. Relevant memories are merged, while others are directly added. Prospective reflection ensures efficient organization of personal knowledge for future retrieval.
  • Figure 3: Illustration of Retrospective Reflection. The Retriever fetches Top-$K$ memory entries from the memory bank, which are refined by the learnable Reranker to select the Top-$M$ most relevant entries. These entries are passed to the LLM along with the query to generate the final response. The LLM assigns binary citation scores ($+1$ for useful and $-1$ for not useful) to the retrieved memory entries based on their utility in the response. These scores are used as reward signals to update the reranker via an RL update, adapting the selection of relevant memory over time.
  • Figure 4: Granularity analysis on randomly sampled 100 instances from LongMemEval with the GTE retriever and Gemini-1.5-Flash generator. "Turn" and "Session" indicate retrieval at a fixed granularity. "Mix" represents retrieving from a pool combining both turns and sessions. "PR" refers to the granularity resulting from the proposed Prospective Reflection, while "Best" corresponds to selecting the optimal granularity (either turn or session) for each instance.
  • Figure 5: Impact of offline pretraining on retriever performance for LongMemEval dataset with the same 100 random samples as Figure \ref{['fig:granu']}. Results without offline pretraining are shown in blue, while results with offline pretraining are shown in orange. Offline pretraining improves recall and accuracy across all settings.
  • ...and 2 more figures