Table of Contents
Fetching ...

AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents

Petr Anokhin, Nikita Semenov, Artyom Sorokin, Dmitry Evseev, Andrey Kravchenko, Mikhail Burtsev, Evgeny Burnaev

TL;DR

This work introduces AriGraph, a memory graph that fuses semantic knowledge with episodic memories to enable structured world modeling for LLM agents. Coupled with the Ariadne cognitive architecture, it supports planning, decision-making, and graph-guided navigation in interactive environments. Empirical evaluation across TextWorld, NetHack, and multi-hop QA demonstrates that AriGraph-empowered agents outperform unstructured memory baselines and RL agents, achieving near-human performance in some tasks and competitive QA results at lower cost. The results underscore the value of integrated, graph-based memories for scalable reasoning and exploration in partially observable domains, with avenues for multimodal extensions and richer graph-search strategies.

Abstract

Advancements in the capabilities of Large Language Models (LLMs) have created a promising foundation for developing autonomous agents. With the right tools, these agents could learn to solve tasks in new environments by accumulating and updating their knowledge. Current LLM-based agents process past experiences using a full history of observations, summarization, retrieval augmentation. However, these unstructured memory representations do not facilitate the reasoning and planning essential for complex decision-making. In our study, we introduce AriGraph, a novel method wherein the agent constructs and updates a memory graph that integrates semantic and episodic memories while exploring the environment. We demonstrate that our Ariadne LLM agent, consisting of the proposed memory architecture augmented with planning and decision-making, effectively handles complex tasks within interactive text game environments difficult even for human players. Results show that our approach markedly outperforms other established memory methods and strong RL baselines in a range of problems of varying complexity. Additionally, AriGraph demonstrates competitive performance compared to dedicated knowledge graph-based methods in static multi-hop question-answering.

AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents

TL;DR

This work introduces AriGraph, a memory graph that fuses semantic knowledge with episodic memories to enable structured world modeling for LLM agents. Coupled with the Ariadne cognitive architecture, it supports planning, decision-making, and graph-guided navigation in interactive environments. Empirical evaluation across TextWorld, NetHack, and multi-hop QA demonstrates that AriGraph-empowered agents outperform unstructured memory baselines and RL agents, achieving near-human performance in some tasks and competitive QA results at lower cost. The results underscore the value of integrated, graph-based memories for scalable reasoning and exploration in partially observable domains, with avenues for multimodal extensions and richer graph-search strategies.

Abstract

Advancements in the capabilities of Large Language Models (LLMs) have created a promising foundation for developing autonomous agents. With the right tools, these agents could learn to solve tasks in new environments by accumulating and updating their knowledge. Current LLM-based agents process past experiences using a full history of observations, summarization, retrieval augmentation. However, these unstructured memory representations do not facilitate the reasoning and planning essential for complex decision-making. In our study, we introduce AriGraph, a novel method wherein the agent constructs and updates a memory graph that integrates semantic and episodic memories while exploring the environment. We demonstrate that our Ariadne LLM agent, consisting of the proposed memory architecture augmented with planning and decision-making, effectively handles complex tasks within interactive text game environments difficult even for human players. Results show that our approach markedly outperforms other established memory methods and strong RL baselines in a range of problems of varying complexity. Additionally, AriGraph demonstrates competitive performance compared to dedicated knowledge graph-based methods in static multi-hop question-answering.
Paper Structure (20 sections, 1 equation, 14 figures, 4 tables, 3 algorithms)

This paper contains 20 sections, 1 equation, 14 figures, 4 tables, 3 algorithms.

Figures (14)

  • Figure 1: (A) The architecture of our Ariadne agent, equipped with AriGraph memory. AriGraph integrates both semantic knowledge graph and past experiences. Memory in the form of a semantic knowledge graph extended with episodic vertices and edges significantly enhances the performance of LLM-agent in text-based games. (B) The average performance of our agent on text games, compared to various baselines including human players and other LLM memory implementations. The LLM-agents differ only in the memory module, while the decision-making component remains identical across all versions. The results for the agents are displayed for the top three out of five runs. For human players, the results are presented as both the top three and the average across all participants.
  • Figure 2: AriGraph world model and Ariadne cognitive architecture. (A) AriGraph learns episodic and semantic knowledge during interaction with unknown environment. At each time step $t$ new episodic vertex (containing full textual observation $o_t$) is added to the episodic memory. Then LLM model parses observation $o_t$ to extract relevant relationships in a form of triplets $(object_{1},\, relation,\, object_{2})$. These triplets are used to update semantic memory graph. The connection between episodic and semantic memory occurs through episodic edges that link each episodic vertex with all triplets extracted from respective observation. (B) Ariadne agent explores the environment and accomplishes tasks with AriGraph. User sets goal to the agent. Working memory is populated with recent history of observations and actions, relevant semantic and episodic knowledge retrieved from the AirGraph world model. Planing LLM module uses content of working memory to generate new or update existing plan. Results of planning are stored back in working memory. Finally, a ReAct-based module reads memory content and selects one of possible actions to be executed in the environment. Every observation triggers learning that updates agent's world model.
  • Figure 3: AriGraph world model enables Ariadne agent to successfully solve variety of text games. (A) Ariadne outperform baseline agents with alternative types of memory. (B) Ariadne with episodic and semantic memory scales to harder environments without losing performance. (C) Ariadne shows performance comparable to the best human players. The Y-axis shows the normalized score, which is calculated relative to the maximum possible points that can be obtained in each environment. Error bars show standard deviation. The number of max steps is set to 60 in the Cooking and to 150 in other games.
  • Figure 4: Ariadne LLM agent shows top performance compared to RL alternatives. Comparison of Ariadne and Full History baseline (GPT-4) with RL baselines in the cooking benchmark. Ariadne demonstrates superior performance across all 4 difficulty levels
  • Figure 5: AriGraph demonstrate good scaling during learning and with environment size. A size of the knowledge graph quickly saturates during exploration and learning phase. KG grows moderately when the Treasure Hunt and the Cooking games include more rooms and objects in their hard versions.
  • ...and 9 more figures