Table of Contents
Fetching ...

GraphSeek: Next-Generation Graph Analytics with LLMs

Maciej Besta, Łukasz Jarmocik, Orest Hrycyna, Shachar Klaiman, Konrad Mączka, Robert Gerstenberger, Jürgen Müller, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler

TL;DR

GraphSeek tackles the challenge of making NL-driven graph analytics feasible on industry-scale property graphs by grounding schema and operations in a Semantic Catalog and separating high-level LLM reasoning (Semantic Plane) from deterministic execution (Execution Plane). It introduces an Execution Model extended from TAG and a two-plane architecture, supported by an Adaptive Toolset and a Hybrid Data Store, enabling token-efficient planning and exact backend execution. Across EV-manufacturing and WikiDataSets benchmarks, GraphSeek achieves higher success rates (e.g., 86% over enhanced LangChain), lower and more stable token costs, and interactive latencies (~$2$–$3$ seconds) versus baseline frameworks, with ablations confirming the value of the Semantic Catalog and modular tools. These results provide a practical blueprint for scalable, affordable, LLM-assisted graph analytics that preserve database-grade semantics while accommodating evolving, heterogeneous graphs.

Abstract

Graphs are foundational across domains but remain hard to use without deep expertise. LLMs promise accessible natural language (NL) graph analytics, yet they fail to process industry-scale property graphs effectively and efficiently: such datasets are large, highly heterogeneous, structurally complex, and evolve dynamically. To address this, we devise a novel abstraction for complex multi-query analytics over such graphs. Its key idea is to replace brittle generation of graph queries directly from NL with planning over a Semantic Catalog that describes both the graph schema and the graph operations. Concretely, this induces a clean separation between a Semantic Plane for LLM planning and broader reasoning, and an Execution Plane for deterministic, database-grade query execution over the full dataset and tool implementations. This design yields substantial gains in both token efficiency and task effectiveness even with small-context LLMs. We use this abstraction as the basis of the first LLM-enhanced graph analytics framework called GraphSeek. GraphSeek achieves substantially higher success rates (e.g., 86% over enhanced LangChain) and points toward the next generation of affordable and accessible graph analytics that unify LLM reasoning with database-grade execution over large and complex property graphs.

GraphSeek: Next-Generation Graph Analytics with LLMs

TL;DR

GraphSeek tackles the challenge of making NL-driven graph analytics feasible on industry-scale property graphs by grounding schema and operations in a Semantic Catalog and separating high-level LLM reasoning (Semantic Plane) from deterministic execution (Execution Plane). It introduces an Execution Model extended from TAG and a two-plane architecture, supported by an Adaptive Toolset and a Hybrid Data Store, enabling token-efficient planning and exact backend execution. Across EV-manufacturing and WikiDataSets benchmarks, GraphSeek achieves higher success rates (e.g., 86% over enhanced LangChain), lower and more stable token costs, and interactive latencies (~ seconds) versus baseline frameworks, with ablations confirming the value of the Semantic Catalog and modular tools. These results provide a practical blueprint for scalable, affordable, LLM-assisted graph analytics that preserve database-grade semantics while accommodating evolving, heterogeneous graphs.

Abstract

Graphs are foundational across domains but remain hard to use without deep expertise. LLMs promise accessible natural language (NL) graph analytics, yet they fail to process industry-scale property graphs effectively and efficiently: such datasets are large, highly heterogeneous, structurally complex, and evolve dynamically. To address this, we devise a novel abstraction for complex multi-query analytics over such graphs. Its key idea is to replace brittle generation of graph queries directly from NL with planning over a Semantic Catalog that describes both the graph schema and the graph operations. Concretely, this induces a clean separation between a Semantic Plane for LLM planning and broader reasoning, and an Execution Plane for deterministic, database-grade query execution over the full dataset and tool implementations. This design yields substantial gains in both token efficiency and task effectiveness even with small-context LLMs. We use this abstraction as the basis of the first LLM-enhanced graph analytics framework called GraphSeek. GraphSeek achieves substantially higher success rates (e.g., 86% over enhanced LangChain) and points toward the next generation of affordable and accessible graph analytics that unify LLM reasoning with database-grade execution over large and complex property graphs.
Paper Structure (49 sections, 2 equations, 7 figures)

This paper contains 49 sections, 2 equations, 7 figures.

Figures (7)

  • Figure 1: Overview of the key idea and architectural pillars behind GraphSeek: Semantic Catalog makes in‐house graph data, graph operations, and other elements legible to the LLM without fine‐tuning or long prompts; separation of Semantic and Execution Planes allows the LLM to focus on high-level analytic reasoning and enables both effective and efficient data reasoning.
  • Figure 2: Overview of research areas related to this work. GraphSeek sits on the overlap of Traditional Graph Analytics, Large Language Models, and Graph Representation Learning.
  • Figure 3: Overview of the GraphSeek framework; detailed workflows for the task lifecycle and tool adaptation are in Appendix \ref{['sec:app:arch']}.
  • Figure 4: Results for the graph analytics queries. Details of individual queries and the explanation of query groups are in Appendix \ref{['sec:app:queries:industry']}.
  • Figure 5: Results for the Countries benchmark of WikiDataSets. Details of queries and their groups are in Appendix \ref{['sec:app:queries:countries']}.
  • ...and 2 more figures