GraphSeek: Next-Generation Graph Analytics with LLMs
Maciej Besta, Łukasz Jarmocik, Orest Hrycyna, Shachar Klaiman, Konrad Mączka, Robert Gerstenberger, Jürgen Müller, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler
TL;DR
GraphSeek tackles the challenge of making NL-driven graph analytics feasible on industry-scale property graphs by grounding schema and operations in a Semantic Catalog and separating high-level LLM reasoning (Semantic Plane) from deterministic execution (Execution Plane). It introduces an Execution Model extended from TAG and a two-plane architecture, supported by an Adaptive Toolset and a Hybrid Data Store, enabling token-efficient planning and exact backend execution. Across EV-manufacturing and WikiDataSets benchmarks, GraphSeek achieves higher success rates (e.g., 86% over enhanced LangChain), lower and more stable token costs, and interactive latencies (~$2$–$3$ seconds) versus baseline frameworks, with ablations confirming the value of the Semantic Catalog and modular tools. These results provide a practical blueprint for scalable, affordable, LLM-assisted graph analytics that preserve database-grade semantics while accommodating evolving, heterogeneous graphs.
Abstract
Graphs are foundational across domains but remain hard to use without deep expertise. LLMs promise accessible natural language (NL) graph analytics, yet they fail to process industry-scale property graphs effectively and efficiently: such datasets are large, highly heterogeneous, structurally complex, and evolve dynamically. To address this, we devise a novel abstraction for complex multi-query analytics over such graphs. Its key idea is to replace brittle generation of graph queries directly from NL with planning over a Semantic Catalog that describes both the graph schema and the graph operations. Concretely, this induces a clean separation between a Semantic Plane for LLM planning and broader reasoning, and an Execution Plane for deterministic, database-grade query execution over the full dataset and tool implementations. This design yields substantial gains in both token efficiency and task effectiveness even with small-context LLMs. We use this abstraction as the basis of the first LLM-enhanced graph analytics framework called GraphSeek. GraphSeek achieves substantially higher success rates (e.g., 86% over enhanced LangChain) and points toward the next generation of affordable and accessible graph analytics that unify LLM reasoning with database-grade execution over large and complex property graphs.
