Table of Contents
Fetching ...

On Memory Construction and Retrieval for Personalized Conversational Agents

Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Xufang Luo, Hao Cheng, Dongsheng Li, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Jianfeng Gao

TL;DR

The paper investigates how memory granularity affects retrieval-augmented response generation in long-term open-domain conversations. It introduces SeCom, a segment-level memory framework that uses a conversation segmentation model and compression-based denoising (LLMLingua-2) to enhance memory retrieval. Empirical results on LOCOMO and Long-MT-Bench+ show SeCom consistently outperforms turn-level, session-level, and summarization-based baselines, with strong robustness to different retrievers and segmentation models. The work also presents a segmentation evaluation setup with zero-shot and reflection-based improvements, demonstrating good transferability to standard dialogue segmentation benchmarks.

Abstract

To deliver coherent and personalized experiences in long-term conversations, existing approaches typically perform retrieval augmented response generation by constructing memory banks from conversation history at either the turn-level, session-level, or through summarization techniques.In this paper, we present two key findings: (1) The granularity of memory unit matters: turn-level, session-level, and summarization-based methods each exhibit limitations in both memory retrieval accuracy and the semantic quality of the retrieved content. (2) Prompt compression methods, such as LLMLingua-2, can effectively serve as a denoising mechanism, enhancing memory retrieval accuracy across different granularities. Building on these insights, we propose SeCom, a method that constructs the memory bank at segment level by introducing a conversation segmentation model that partitions long-term conversations into topically coherent segments, while applying compression based denoising on memory units to enhance memory retrieval. Experimental results show that SeCom exhibits a significant performance advantage over baselines on long-term conversation benchmarks LOCOMO and Long-MT-Bench+. Additionally, the proposed conversation segmentation method demonstrates superior performance on dialogue segmentation datasets such as DialSeg711, TIAGE, and SuperDialSeg.

On Memory Construction and Retrieval for Personalized Conversational Agents

TL;DR

The paper investigates how memory granularity affects retrieval-augmented response generation in long-term open-domain conversations. It introduces SeCom, a segment-level memory framework that uses a conversation segmentation model and compression-based denoising (LLMLingua-2) to enhance memory retrieval. Empirical results on LOCOMO and Long-MT-Bench+ show SeCom consistently outperforms turn-level, session-level, and summarization-based baselines, with strong robustness to different retrievers and segmentation models. The work also presents a segmentation evaluation setup with zero-shot and reflection-based improvements, demonstrating good transferability to standard dialogue segmentation benchmarks.

Abstract

To deliver coherent and personalized experiences in long-term conversations, existing approaches typically perform retrieval augmented response generation by constructing memory banks from conversation history at either the turn-level, session-level, or through summarization techniques.In this paper, we present two key findings: (1) The granularity of memory unit matters: turn-level, session-level, and summarization-based methods each exhibit limitations in both memory retrieval accuracy and the semantic quality of the retrieved content. (2) Prompt compression methods, such as LLMLingua-2, can effectively serve as a denoising mechanism, enhancing memory retrieval accuracy across different granularities. Building on these insights, we propose SeCom, a method that constructs the memory bank at segment level by introducing a conversation segmentation model that partitions long-term conversations into topically coherent segments, while applying compression based denoising on memory units to enhance memory retrieval. Experimental results show that SeCom exhibits a significant performance advantage over baselines on long-term conversation benchmarks LOCOMO and Long-MT-Bench+. Additionally, the proposed conversation segmentation method demonstrates superior performance on dialogue segmentation datasets such as DialSeg711, TIAGE, and SuperDialSeg.

Paper Structure

This paper contains 33 sections, 7 equations, 16 figures, 11 tables.

Figures (16)

  • Figure 1: Illustration of retrieval augmented response generation with different memory granularities. Turn-level memory is too fine-grained, leading to fragmentary and incomplete context. Session-level memory is too coarse-grained, containing too much irrelevant information. Summary based methods suffer from information loss that occurs during summarization. Ours (segment-level memory) can better capture topically coherent units in long conversations, striking a balance between including more relevant, coherent information while excluding irrelevant content. Bullseye $\odot$ indicates the retrieved memory units at turn level or segment level under the same context budget. [0.xx]: similarity between target query and history content. Turn-level retrieval errors: false negative, false positive.
  • Figure 2: The impact of memory granularity on the response quality (a) and retrieval accuracy (b, c).
  • Figure 3: Prompt compression method (LLMLingua-2) can serve as an effective denoising technique to enhance the memory retrieval system by: (a) improving the retrieval recall with varying context budget $K$; (b) benefiting the retrieval system by increasing the similarity between the query and relevant segments while decreasing the similarity with irrelevant ones.
  • Figure 4: GPT-4 based pairwise performance comparison on LOCOMO with BM25 based retriever.
  • Figure 5: Performance comparison of different memory granularities with various context budget on Long-MT-Bench+.
  • ...and 11 more figures