Auto-Cypher: Improving LLMs on Cypher generation via LLM-supervised generation-verification framework
Aman Tiwari, Shiva Krishna Reddy Malay, Vikas Yadav, Masoud Hashemi, Sathwik Tejaswi Madhusudhan
TL;DR
Auto-Cypher addresses the gap in Text2Cypher by introducing SynthCypher, a fully automated LLM-supervised data-generation and validation pipeline that uses LLMs as database fillers to ensure executable Cypher queries. The pipeline yields a large, diverse synthetic dataset across 109 query types and 700 domains, enabling supervised fine-tuning of open-source LLMs and an adapted SPIDER-Cypher benchmark for evaluation. Finetuning open models on SynthCypher achieves substantial performance gains (up to 40 percentage points for 7B/8B models and ~30 points on SPIDER-Cypher), demonstrating the value of guided, executable data generation for graph NL-to-Cypher tasks. The work provides practical benchmarks and a scalable approach to improve Cypher generation in graph databases like Neo4j, with implications for broader NL-to-graph-query systems.
Abstract
Graph databases like Neo4j are gaining popularity for handling complex, interconnected data, over traditional relational databases in modeling and querying relationships. While translating natural language into SQL queries is well-researched, generating Cypher queries for Neo4j remains relatively underexplored. In this work, we present an automated, LLM-Supervised, pipeline to generate high-quality synthetic data for Text2Cypher. Our Cypher data generation pipeline introduces LLM-As-Database-Filler, a novel strategy for ensuring Cypher query correctness, thus resulting in high quality generations. Using our pipeline, we generate high quality Text2Cypher data - SynthCypher containing 29.8k instances across various domains and queries with varying complexities. Training open-source LLMs like LLaMa-3.1-8B, Mistral-7B, and QWEN-7B on SynthCypher results in performance gains of up to 40% on the Text2Cypher test split and 30% on the SPIDER benchmark, adapted for graph databases.
