Table of Contents
Fetching ...

CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era

Yanlin Feng, Simone Papicchio, Sajjadur Rahman

TL;DR

CypherBench tackles precise retrieval over full-scale modern knowledge graphs by transforming RDF data into domain-specific property graphs that Cypher can efficiently query. It identifies fundamental RDF challenges for LLM-based retrieval and presents an RDF-to-property-graph transformation pipeline, a 11-graph Wikidata-derived benchmark (7.8M entities, 10k+ questions), and a text-to-Cypher task-generation framework with evaluation metrics EX and PSJS. Zero-shot evaluations across multiple LLMs show substantial gaps, with top models achieving roughly 60% EX and 81% PSJS, underscoring the difficulty of CypherBench and the need for graph-aware prompting and tooling. The work offers a practical pathway to integrate full-scale knowledge graphs with LLMs, provides a rigorous benchmark for graph retrieval, and lays groundwork for future improvements in domain-specific graph views and Cypher-based retrieval in GraphRAG systems.

Abstract

Retrieval from graph data is crucial for augmenting large language models (LLM) with both open-domain knowledge and private enterprise data, and it is also a key component in the recent GraphRAG system (edge et al., 2024). Despite decades of research on knowledge graphs and knowledge base question answering, leading LLM frameworks (e.g. Langchain and LlamaIndex) have only minimal support for retrieval from modern encyclopedic knowledge graphs like Wikidata. In this paper, we analyze the root cause and suggest that modern RDF knowledge graphs (e.g. Wikidata, Freebase) are less efficient for LLMs due to overly large schemas that far exceed the typical LLM context window, use of resource identifiers, overlapping relation types and lack of normalization. As a solution, we propose property graph views on top of the underlying RDF graph that can be efficiently queried by LLMs using Cypher. We instantiated this idea on Wikidata and introduced CypherBench, the first benchmark with 11 large-scale, multi-domain property graphs with 7.8 million entities and over 10,000 questions. To achieve this, we tackled several key challenges, including developing an RDF-to-property graph conversion engine, creating a systematic pipeline for text-to-Cypher task generation, and designing new evaluation metrics.

CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era

TL;DR

CypherBench tackles precise retrieval over full-scale modern knowledge graphs by transforming RDF data into domain-specific property graphs that Cypher can efficiently query. It identifies fundamental RDF challenges for LLM-based retrieval and presents an RDF-to-property-graph transformation pipeline, a 11-graph Wikidata-derived benchmark (7.8M entities, 10k+ questions), and a text-to-Cypher task-generation framework with evaluation metrics EX and PSJS. Zero-shot evaluations across multiple LLMs show substantial gaps, with top models achieving roughly 60% EX and 81% PSJS, underscoring the difficulty of CypherBench and the need for graph-aware prompting and tooling. The work offers a practical pathway to integrate full-scale knowledge graphs with LLMs, provides a rigorous benchmark for graph retrieval, and lays groundwork for future improvements in domain-specific graph views and Cypher-based retrieval in GraphRAG systems.

Abstract

Retrieval from graph data is crucial for augmenting large language models (LLM) with both open-domain knowledge and private enterprise data, and it is also a key component in the recent GraphRAG system (edge et al., 2024). Despite decades of research on knowledge graphs and knowledge base question answering, leading LLM frameworks (e.g. Langchain and LlamaIndex) have only minimal support for retrieval from modern encyclopedic knowledge graphs like Wikidata. In this paper, we analyze the root cause and suggest that modern RDF knowledge graphs (e.g. Wikidata, Freebase) are less efficient for LLMs due to overly large schemas that far exceed the typical LLM context window, use of resource identifiers, overlapping relation types and lack of normalization. As a solution, we propose property graph views on top of the underlying RDF graph that can be efficiently queried by LLMs using Cypher. We instantiated this idea on Wikidata and introduced CypherBench, the first benchmark with 11 large-scale, multi-domain property graphs with 7.8 million entities and over 10,000 questions. To achieve this, we tackled several key challenges, including developing an RDF-to-property graph conversion engine, creating a systematic pipeline for text-to-Cypher task generation, and designing new evaluation metrics.

Paper Structure

This paper contains 42 sections, 2 equations, 6 figures, 15 tables.

Figures (6)

  • Figure 1: An illustration of Cypher as a unified interface for retrieval over both RDF and property graphs. A typical graph retrieval or RAG workflow involves: 1) text-to-Cypher translation using an LLM, 2) Cypher query execution, and optionally, 3) final answer generation.
  • Figure 2: CypherBench construction process: Wikidata is transformed into schema-enforced property graphs, which enables efficient and accurate text-to-Cypher querying. These property graphs are then used to generate text-to-Cypher tasks.
  • Figure 3: Schema of the company graph with entity properties an relation properties. See \ref{['appendix:cypherbench']} for other graphs.
  • Figure 4: Distribution of graph matching patterns, RETURN templates, domains, and answer lengths (number of rows in the answer) in the CypherBench test set.
  • Figure 5: Performance across basic and special MATCH patterns, RETURN templates and domains.
  • ...and 1 more figures