Table of Contents
Fetching ...

Text2Cypher: Bridging Natural Language and Graph Databases

Makbule Gulcin Ozsoy, Leila Messallem, Jon Besga, Gianandrea Minneci

TL;DR

This work tackles translating natural language queries into Cypher for graph databases by constructing a large, unified Text2Cypher dataset from public sources and evaluating a broad set of models. It demonstrates that domain-specific fine-tuning on the assembled 44,387-instance corpus yields clear performance gains over baselines in both translation-based and execution-based evaluations. The results show that fine-tuned models can substantially improve NL-to-Cypher translation, supporting more accessible knowledge-graph querying. The public dataset and benchmarking framework provide a valuable resource for advancing natural language interfaces to graph databases and knowledge graphs.

Abstract

Knowledge graphs use nodes, relationships, and properties to represent arbitrarily complex data. When stored in a graph database, the Cypher query language enables efficient modeling and querying of knowledge graphs. However, using Cypher requires specialized knowledge, which can present a challenge for non-expert users. Our work Text2Cypher aims to bridge this gap by translating natural language queries into Cypher query language and extending the utility of knowledge graphs to non-technical expert users. While large language models (LLMs) can be used for this purpose, they often struggle to capture complex nuances, resulting in incomplete or incorrect outputs. Fine-tuning LLMs on domain-specific datasets has proven to be a more promising approach, but the limited availability of high-quality, publicly available Text2Cypher datasets makes this challenging. In this work, we show how we combined, cleaned and organized several publicly available datasets into a total of 44,387 instances, enabling effective fine-tuning and evaluation. Models fine-tuned on this dataset showed significant performance gains, with improvements in Google-BLEU and Exact Match scores over baseline models, highlighting the importance of high-quality datasets and fine-tuning in improving Text2Cypher performance.

Text2Cypher: Bridging Natural Language and Graph Databases

TL;DR

This work tackles translating natural language queries into Cypher for graph databases by constructing a large, unified Text2Cypher dataset from public sources and evaluating a broad set of models. It demonstrates that domain-specific fine-tuning on the assembled 44,387-instance corpus yields clear performance gains over baselines in both translation-based and execution-based evaluations. The results show that fine-tuned models can substantially improve NL-to-Cypher translation, supporting more accessible knowledge-graph querying. The public dataset and benchmarking framework provide a valuable resource for advancing natural language interfaces to graph databases and knowledge graphs.

Abstract

Knowledge graphs use nodes, relationships, and properties to represent arbitrarily complex data. When stored in a graph database, the Cypher query language enables efficient modeling and querying of knowledge graphs. However, using Cypher requires specialized knowledge, which can present a challenge for non-expert users. Our work Text2Cypher aims to bridge this gap by translating natural language queries into Cypher query language and extending the utility of knowledge graphs to non-technical expert users. While large language models (LLMs) can be used for this purpose, they often struggle to capture complex nuances, resulting in incomplete or incorrect outputs. Fine-tuning LLMs on domain-specific datasets has proven to be a more promising approach, but the limited availability of high-quality, publicly available Text2Cypher datasets makes this challenging. In this work, we show how we combined, cleaned and organized several publicly available datasets into a total of 44,387 instances, enabling effective fine-tuning and evaluation. Models fine-tuned on this dataset showed significant performance gains, with improvements in Google-BLEU and Exact Match scores over baseline models, highlighting the importance of high-quality datasets and fine-tuning in improving Text2Cypher performance.

Paper Structure

This paper contains 18 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: User wants to write a Cypher query for 'What are the movies of Tom Hanks'. A Text2Cypher model translates the input natural language question into Cypher, i.e., 'MATCH (actor:Person {name: "Tom Hanks"})-[:ACTED_IN]->(movie:Movie) RETURN movie.title AS movies'
  • Figure 2: Relational databases uses SQL-based query languages, while Graph databases commonly uses Cypher query language. The figure shows an example representation of Person, Location, Gender and Marriage entities and relationships on a relational and graph database.
  • Figure 3: Data distribution: The train and test splits consist ${\sim}89\%$ and ${\sim}11\%$ of the overall data, respectively.
  • Figure 4: Performance comparison of the baseline and finetuned models