Table of Contents
Fetching ...

Affordable AI Assistants with Knowledge Graph of Thoughts

Maciej Besta, Lorenzo Paleari, Jia Hao Andrea Jiang, Robert Gerstenberger, You Wu, Jón Gunnar Hannesson, Patrick Iff, Ales Kubicek, Piotr Nyczyk, Diana Khimey, Nils Blach, Haiqiang Zhang, Tao Zhang, Peiran Ma, Grzegorz Kwaśniewski, Marcin Copik, Hubert Niewiadomski, Torsten Hoefler

TL;DR

This work tackles the high cost and limited success of large LLM-driven agents by introducing Knowledge Graph of Thoughts (KGoT), a modular AI assistant architecture that constructs and evolves task-specific knowledge graphs to guide reasoning. KGoT combines a dual-LLM controller with a Graph Store supporting multiple KG representations (property, RDF, adjacency list) and a versatile tool suite, enabling iterative knowledge acquisition and structured query-based or script-based retrieval. Empirical results on GAIA and SimpleQA show that KGoT achieves higher task success rates while substantially reducing costs (e.g., up to 36x cheaper than GPT-4o without sacrificing performance), and that externalizing reasoning into a KG improves transparency, bias mitigation, and robustness via Self-Consistency and LLM-as-a-Judge. The approach demonstrates strong scalability through asynchronous execution and MPI-based distribution, and lays groundwork for applying KG-based reasoning to diverse, complex domains with external compute workflows.

Abstract

Large Language Models (LLMs) are revolutionizing the development of AI assistants capable of performing diverse tasks across domains. However, current state-of-the-art LLM-driven agents face significant challenges, including high operational costs and limited success rates on complex benchmarks like GAIA. To address these issues, we propose Knowledge Graph of Thoughts (KGoT), an innovative AI assistant architecture that integrates LLM reasoning with dynamically constructed knowledge graphs (KGs). KGoT extracts and structures task-relevant knowledge into a dynamic KG representation, iteratively enhanced through external tools such as math solvers, web crawlers, and Python scripts. Such structured representation of task-relevant knowledge enables low-cost models to solve complex tasks effectively while also minimizing bias and noise. For example, KGoT achieves a 29% improvement in task success rates on the GAIA benchmark compared to Hugging Face Agents with GPT-4o mini. Moreover, harnessing a smaller model dramatically reduces operational costs by over 36x compared to GPT-4o. Improvements for other models (e.g., Qwen2.5-32B and Deepseek-R1-70B) and benchmarks (e.g., SimpleQA) are similar. KGoT offers a scalable, affordable, versatile, and high-performing solution for AI assistants.

Affordable AI Assistants with Knowledge Graph of Thoughts

TL;DR

This work tackles the high cost and limited success of large LLM-driven agents by introducing Knowledge Graph of Thoughts (KGoT), a modular AI assistant architecture that constructs and evolves task-specific knowledge graphs to guide reasoning. KGoT combines a dual-LLM controller with a Graph Store supporting multiple KG representations (property, RDF, adjacency list) and a versatile tool suite, enabling iterative knowledge acquisition and structured query-based or script-based retrieval. Empirical results on GAIA and SimpleQA show that KGoT achieves higher task success rates while substantially reducing costs (e.g., up to 36x cheaper than GPT-4o without sacrificing performance), and that externalizing reasoning into a KG improves transparency, bias mitigation, and robustness via Self-Consistency and LLM-as-a-Judge. The approach demonstrates strong scalability through asynchronous execution and MPI-based distribution, and lays groundwork for applying KG-based reasoning to diverse, complex domains with external compute workflows.

Abstract

Large Language Models (LLMs) are revolutionizing the development of AI assistants capable of performing diverse tasks across domains. However, current state-of-the-art LLM-driven agents face significant challenges, including high operational costs and limited success rates on complex benchmarks like GAIA. To address these issues, we propose Knowledge Graph of Thoughts (KGoT), an innovative AI assistant architecture that integrates LLM reasoning with dynamically constructed knowledge graphs (KGs). KGoT extracts and structures task-relevant knowledge into a dynamic KG representation, iteratively enhanced through external tools such as math solvers, web crawlers, and Python scripts. Such structured representation of task-relevant knowledge enables low-cost models to solve complex tasks effectively while also minimizing bias and noise. For example, KGoT achieves a 29% improvement in task success rates on the GAIA benchmark compared to Hugging Face Agents with GPT-4o mini. Moreover, harnessing a smaller model dramatically reduces operational costs by over 36x compared to GPT-4o. Improvements for other models (e.g., Qwen2.5-32B and Deepseek-R1-70B) and benchmarks (e.g., SimpleQA) are similar. KGoT offers a scalable, affordable, versatile, and high-performing solution for AI assistants.

Paper Structure

This paper contains 57 sections, 20 figures, 5 tables.

Figures (20)

  • Figure 1: The key idea behind Knowledge Graph of Thoughts (KGoT): transforming the representation of a task for an AI assistant from a textual form into a knowledge graph (KG). As an example, we use a Level-3 (i.e., highest difficulty) task from the GAIA benchmark. In order to solve the task, KGoT evolves this KG by adding relevant information that brings the task closer to completion. This is achieved by iteratively running various tools. Finally, the task is solved by extracting the relevant information from the KG, using -- for example -- a graph query, or an LLM's inference process with the KG provided as a part of the input prompt. More examples of KGs are in Appendix \ref{['sec:app:example-kgs']}.
  • Figure 2: Architecture overview of KGoT (top part) and the design details combined with the workflow (bottom part).
  • Figure 3: Advantages of different variants of KGoT over other baselines (Hugging Face Agents using both GPT-4o-mini and GPT-4o, Magentic-One, GPTSwarm, two RAG baselines, Zero-Shot GPT-4o mini, and Zero-Shot GPT-4o) on the validation dataset of the GAIA benchmark. DR stands for Direct Retrieval. The used model is GPT-4o mini unless noted otherwise.
  • Figure 4: Performance on the GAIA validation set with KGoT (non-fusion) using various LLM models. For KGoT, we use Cypher queries for knowledge extraction from the Neo4j database.
  • Figure 5: The impact coming from harnessing knowledge graphs (KGs) with different knowledge extraction methods (graph queries with Neo4j and Cypher, and general-purpose languages with Python and NetworkX), vs. using no KGs at all. DR stands for Direct Retrieval. Model: GPT-4o mini.
  • ...and 15 more figures