Table of Contents
Fetching ...

HyperMem: Hypergraph Memory for Long-Term Conversations

Juwei Yue, Chuanrui Hu, Jiawei Sheng, Zuyi Zhou, Wenyuan Zhang, Tingwen Liu, Li Guo, Yafeng Deng

Abstract

Long-term memory is essential for conversational agents to maintain coherence, track persistent tasks, and provide personalized interactions across extended dialogues. However, existing approaches as Retrieval-Augmented Generation (RAG) and graph-based memory mostly rely on pairwise relations, which can hardly capture high-order associations, i.e., joint dependencies among multiple elements, causing fragmented retrieval. To this end, we propose HyperMem, a hypergraph-based hierarchical memory architecture that explicitly models such associations using hyperedges. Particularly, HyperMem structures memory into three levels: topics, episodes, and facts, and groups related episodes and their facts via hyperedges, unifying scattered content into coherent units. Leveraging this structure, we design a hybrid lexical-semantic index and a coarse-to-fine retrieval strategy, supporting accurate and efficient retrieval of high-order associations. Experiments on the LoCoMo benchmark show that HyperMem achieves state-of-the-art performance with 92.73% LLM-as-a-judge accuracy, demonstrating the effectiveness of HyperMem for long-term conversations.

HyperMem: Hypergraph Memory for Long-Term Conversations

Abstract

Long-term memory is essential for conversational agents to maintain coherence, track persistent tasks, and provide personalized interactions across extended dialogues. However, existing approaches as Retrieval-Augmented Generation (RAG) and graph-based memory mostly rely on pairwise relations, which can hardly capture high-order associations, i.e., joint dependencies among multiple elements, causing fragmented retrieval. To this end, we propose HyperMem, a hypergraph-based hierarchical memory architecture that explicitly models such associations using hyperedges. Particularly, HyperMem structures memory into three levels: topics, episodes, and facts, and groups related episodes and their facts via hyperedges, unifying scattered content into coherent units. Leveraging this structure, we design a hybrid lexical-semantic index and a coarse-to-fine retrieval strategy, supporting accurate and efficient retrieval of high-order associations. Experiments on the LoCoMo benchmark show that HyperMem achieves state-of-the-art performance with 92.73% LLM-as-a-judge accuracy, demonstrating the effectiveness of HyperMem for long-term conversations.

Paper Structure

This paper contains 42 sections, 4 equations, 13 figures, 2 tables, 3 algorithms.

Figures (13)

  • Figure 1: Memory structure comparison across Chunk-based RAG, Graph-based RAG, and our HyperMem.
  • Figure 2: Framework of HyperMem. The indexing detects episode boundaries, aggregates topics via hyperedges, and extracts facts. The retrieval performs coarse-to-fine search from topics to episodes to facts.
  • Figure 3: Ablation study across four question categories. FC: Fact context. EC: Episode context. TR: Topic-level retrieval. ER: Episode-level retrieval. The shaded region highlights the full HyperMem configuration.
  • Figure 4: Hyperparameter sensitivity analysis on LoCoMo. We evaluate the impact of embedding fusion weight $\alpha$ and Top-k selection at each hierarchical level (Topic, Episode, Fact) on retrieval performance.
  • Figure 5: Token usage vs. accuracy comparison. The x-axis shows relative token usage (Mem0 as 1.0$\times$ baseline), and the y-axis shows LLM-as-a-judge accuracy.
  • ...and 8 more figures