Table of Contents
Fetching ...

PersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agents

Mikhail Menschikov, Dmitry Evseev, Victoria Dochkina, Ruslan Kostoev, Ilia Perepechkin, Petr Anokhin, Evgeny Burnaev, Nikita Semenov

TL;DR

This work tackles the challenge of long-horizon personalization for large language models by introducing a flexible external memory grounded in a knowledge graph. Built on the AriGraph framework, the system integrates semantic (object, thesis) and episodic memory with hyper-edges, enabling rich temporal and relational reasoning. It offers multiple retrieval algorithms (A*, WaterCircles, BeamSearch and hybrids) and demonstrates how memory configuration interacts with model scale to affect QA performance across DiaASQ, HotpotQA, and TriviaQA, including temporal and contradictory information. The results show that thesis memories are highly informative for small-to-medium models, while hybrid retrieval strategies provide stability for larger models, and that this graph-based memory framework can surpass GraphRAG in certain settings while remaining competitive with RAG baselines when appropriately tuned. The findings illuminate how structured memory and flexible retrieval can enhance personalized, context-aware reasoning at scale, and point to future directions in temporal filtering, distributed memory storage, and privacy-preserving retrieval.

Abstract

Personalizing language models that effectively incorporating user interaction history remains a central challenge in development of adaptive AI systems. While large language models (LLMs), combined with Retrieval-Augmented Generation (RAG), have improved factual accuracy, they often lack structured memory and fail to scale in complex, long-term interactions. To address this, we propose a flexible external memory framework based on knowledge graph, which construct and update memory model automatically by LLM itself. Building upon the AriGraph architecture, we introduce a novel hybrid graph design that supports both standard edges and two types of hyper-edges, enabling rich and dynamic semantic and temporal representations. Our framework also supports diverse retrieval mechanisms, including A*, water-circle traversal, beam search and hybrid methods, making it adaptable to different datasets and LLM capacities. We evaluate our system on three benchmarks: TriviaQA, HotpotQA, DiaASQ and demonstrate that different memory and retrieval configurations yield optimal performance depending on the task. Additionally, we extend the DiaASQ benchmark with temporal annotations and internally contradictory statements, showing that our system remains robust and effective in managing temporal dependencies and context-aware reasoning.

PersonalAI: A Systematic Comparison of Knowledge Graph Storage and Retrieval Approaches for Personalized LLM agents

TL;DR

This work tackles the challenge of long-horizon personalization for large language models by introducing a flexible external memory grounded in a knowledge graph. Built on the AriGraph framework, the system integrates semantic (object, thesis) and episodic memory with hyper-edges, enabling rich temporal and relational reasoning. It offers multiple retrieval algorithms (A*, WaterCircles, BeamSearch and hybrids) and demonstrates how memory configuration interacts with model scale to affect QA performance across DiaASQ, HotpotQA, and TriviaQA, including temporal and contradictory information. The results show that thesis memories are highly informative for small-to-medium models, while hybrid retrieval strategies provide stability for larger models, and that this graph-based memory framework can surpass GraphRAG in certain settings while remaining competitive with RAG baselines when appropriately tuned. The findings illuminate how structured memory and flexible retrieval can enhance personalized, context-aware reasoning at scale, and point to future directions in temporal filtering, distributed memory storage, and privacy-preserving retrieval.

Abstract

Personalizing language models that effectively incorporating user interaction history remains a central challenge in development of adaptive AI systems. While large language models (LLMs), combined with Retrieval-Augmented Generation (RAG), have improved factual accuracy, they often lack structured memory and fail to scale in complex, long-term interactions. To address this, we propose a flexible external memory framework based on knowledge graph, which construct and update memory model automatically by LLM itself. Building upon the AriGraph architecture, we introduce a novel hybrid graph design that supports both standard edges and two types of hyper-edges, enabling rich and dynamic semantic and temporal representations. Our framework also supports diverse retrieval mechanisms, including A*, water-circle traversal, beam search and hybrid methods, making it adaptable to different datasets and LLM capacities. We evaluate our system on three benchmarks: TriviaQA, HotpotQA, DiaASQ and demonstrate that different memory and retrieval configurations yield optimal performance depending on the task. Additionally, we extend the DiaASQ benchmark with temporal annotations and internally contradictory statements, showing that our system remains robust and effective in managing temporal dependencies and context-aware reasoning.

Paper Structure

This paper contains 26 sections, 3 figures, 20 tables.

Figures (3)

  • Figure 1: Example of a graph fragment, constructed from natural language text using our method, with object (green), thesis (yellow) and episodic (blue) vertices
  • Figure 2: High level architecture of proposed Memorize pipeline for LLM-based triples extraction from unstructured texts on natural language and memory construction
  • Figure 3: High level architecture of proposed QA pipeline for generating answers to the questions based on constructed memory graph