Table of Contents
Fetching ...

PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering

Yiming Du, Hongru Wang, Zhengyi Zhao, Bin Liang, Baojun Wang, Wanjun Zhong, Zezhong Wang, Kam-Fai Wong

TL;DR

PerLTQA tackles the challenge of incorporating personal long-term memory into QA by introducing a dataset that fuses semantic memories (profiles, social ties) with episodic memories (events, dialogues) and a three-stage memory integration framework (classification, retrieval, synthesis). The approach is evaluated across five LLMs and three retrievers, revealing that BERT-based memory classification outperforms several LLMs in memory type prediction and that effective memory retrieval is critical for accurate, memory-grounded responses. Empirical results show meaningful improvements in memory-informed synthesis (MAP up to 0.756, correctness up to 0.573) and demonstrate the practical viability of memory-augmented QA, even with smaller models. The dataset provides rich memory-anchored QA content (141 profiles, 1,339 relationships, 4,501 events, 3,409 dialogues, 8,593 QA pairs) and a rigorous evaluation protocol, offering a valuable benchmark for personalized, memory-aware NLP systems and future memory-integrated dialogue agents.

Abstract

Long-term memory plays a critical role in personal interaction, considering long-term memory can better leverage world knowledge, historical information, and preferences in dialogues. Our research introduces PerLTQA, an innovative QA dataset that combines semantic and episodic memories, including world knowledge, profiles, social relationships, events, and dialogues. This dataset is collected to investigate the use of personalized memories, focusing on social interactions and events in the QA task. PerLTQA features two types of memory and a comprehensive benchmark of 8,593 questions for 30 characters, facilitating the exploration and application of personalized memories in Large Language Models (LLMs). Based on PerLTQA, we propose a novel framework for memory integration and generation, consisting of three main components: Memory Classification, Memory Retrieval, and Memory Synthesis. We evaluate this framework using five LLMs and three retrievers. Experimental results demonstrate that BERT-based classification models significantly outperform LLMs such as ChatGLM3 and ChatGPT in the memory classification task. Furthermore, our study highlights the importance of effective memory integration in the QA task.

PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering

TL;DR

PerLTQA tackles the challenge of incorporating personal long-term memory into QA by introducing a dataset that fuses semantic memories (profiles, social ties) with episodic memories (events, dialogues) and a three-stage memory integration framework (classification, retrieval, synthesis). The approach is evaluated across five LLMs and three retrievers, revealing that BERT-based memory classification outperforms several LLMs in memory type prediction and that effective memory retrieval is critical for accurate, memory-grounded responses. Empirical results show meaningful improvements in memory-informed synthesis (MAP up to 0.756, correctness up to 0.573) and demonstrate the practical viability of memory-augmented QA, even with smaller models. The dataset provides rich memory-anchored QA content (141 profiles, 1,339 relationships, 4,501 events, 3,409 dialogues, 8,593 QA pairs) and a rigorous evaluation protocol, offering a valuable benchmark for personalized, memory-aware NLP systems and future memory-integrated dialogue agents.

Abstract

Long-term memory plays a critical role in personal interaction, considering long-term memory can better leverage world knowledge, historical information, and preferences in dialogues. Our research introduces PerLTQA, an innovative QA dataset that combines semantic and episodic memories, including world knowledge, profiles, social relationships, events, and dialogues. This dataset is collected to investigate the use of personalized memories, focusing on social interactions and events in the QA task. PerLTQA features two types of memory and a comprehensive benchmark of 8,593 questions for 30 characters, facilitating the exploration and application of personalized memories in Large Language Models (LLMs). Based on PerLTQA, we propose a novel framework for memory integration and generation, consisting of three main components: Memory Classification, Memory Retrieval, and Memory Synthesis. We evaluate this framework using five LLMs and three retrievers. Experimental results demonstrate that BERT-based classification models significantly outperform LLMs such as ChatGLM3 and ChatGPT in the memory classification task. Furthermore, our study highlights the importance of effective memory integration in the QA task.
Paper Structure (22 sections, 5 equations, 6 figures, 8 tables)

This paper contains 22 sections, 5 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: The process of PerLT Memory generation. A six-step process: Step 1. Seed data collection. Step 2. PRO generation. Step 3. SR generation. Step 4. EVT generation. Step 5. DLG generation and Step 6. Validation.
  • Figure 2: The framework of memory classification, memory retrieval and memory synthesis in QA.
  • Figure 3: Comparative analysis of response performance without retrieval (NR), incorrect retrieval (IR), and Correct Retrieval (CR).
  • Figure 4: Prompts for PRO, SR, EVT, and DLG memory generator.
  • Figure 5: Prompts for question answering generation, and memory anchor candidate searching.
  • ...and 1 more figures