$R^3$-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL

Yuhang Zhou; Yu He; Siyu Tian; Yuchen Ni; Zhangyue Yin; Xiang Liu; Chuanjun Ji; Sen Liu; Xipeng Qiu; Guangnan Ye; Hongfeng Chai

$R^3$-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL

Yuhang Zhou, Yu He, Siyu Tian, Yuchen Ni, Zhangyue Yin, Xiang Liu, Chuanjun Ji, Sen Liu, Xipeng Qiu, Guangnan Ye, Hongfeng Chai

TL;DR

This work addresses NL2GQL by introducing R^3-NL2GQL, a model-coordination framework that assigns distinct roles to smaller ranker/rewriter models and a larger refiner to generate accurate GQL queries. It leverages code-structured graph schemas and code-structured skeletons to convert natural-language inputs into precise GQL, supported by a bilingual, multi-schema dataset and retrieval-alignment techniques. Empirical results show the approach outperforms GPT-family baselines across syntax, semantics, and execution metrics, with ablations confirming the value of code-based prompts and collaborative model interaction. The study demonstrates the practicality of structured, code-oriented representations for graph query generation and sets a foundation for scalable NL2GQL development in diverse schemas and real-world settings.

Abstract

While current tasks of converting natural language to SQL (NL2SQL) using Foundation Models have shown impressive achievements, adapting these approaches for converting natural language to Graph Query Language (NL2GQL) encounters hurdles due to the distinct nature of GQL compared to SQL, alongside the diverse forms of GQL. Moving away from traditional rule-based and slot-filling methodologies, we introduce a novel approach, $R^3$-NL2GQL, integrating both small and large Foundation Models for ranking, rewriting, and refining tasks. This method leverages the interpretative strengths of smaller models for initial ranking and rewriting stages, while capitalizing on the superior generalization and query generation prowess of larger models for the final transformation of natural language queries into GQL formats. Addressing the scarcity of datasets in this emerging field, we have developed a bilingual dataset, sourced from graph database manuals and selected open-source Knowledge Graphs (KGs). Our evaluation of this methodology on this dataset demonstrates its promising efficacy and robustness.

$R^3$-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL

TL;DR

Abstract

-NL2GQL, integrating both small and large Foundation Models for ranking, rewriting, and refining tasks. This method leverages the interpretative strengths of smaller models for initial ranking and rewriting stages, while capitalizing on the superior generalization and query generation prowess of larger models for the final transformation of natural language queries into GQL formats. Addressing the scarcity of datasets in this emerging field, we have developed a bilingual dataset, sourced from graph database manuals and selected open-source Knowledge Graphs (KGs). Our evaluation of this methodology on this dataset demonstrates its promising efficacy and robustness.

Paper Structure (30 sections, 7 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 30 sections, 7 equations, 10 figures, 3 tables, 1 algorithm.

Introduction
Task Formulation
Code-Structured Graph Schema Description
Code-Structured Skeleton for GQL
NL2GQL Task
$R^3$-NL2GQL Framework
Smaller Foundation Model as Ranker
Smaller Foundation Model as Rewriter
Aligning Data in Graph Databases
Graph Database Storage Principles
Data Retrieval
Larger Foundation Model as Refiner
Data Design
Pair Design
Data Refinement
...and 15 more sections

Figures (10)

Figure 1: Retrieval algorithm based on triplet vector v.s. GQL-based method.
Figure 2: The examples of plain-text schema, code-structure schema, and code-structure skeleton: The plain-text schema serves as the vanilla schema prompt and is written in natural language. The code-structure schema leverages the Python language to re-represent the schema of graphs, with the aim of enhancing the model's inference capabilities. The code-structure skeleton extracts essential keywords and clause information, focusing on GQL.
Figure 3: An Overview of $R^3$-NL2GQL: Employing a smaller white-box model as a ranker, it selects required CRUD functions, clauses, and schema from the input. Another smaller white-box model serves as a rewriter, aligning the query with the intrinsic database k-v storage to mitigate the hallucinations. Lastly, a larger model is harnessed for the purpose of generating GQL, capitalizing on its ability in generalization and generation.
Figure 4: The challenge of aligning user queries with the actual graph data: the error has been marked in red.
Figure 5: Data construction pipeline
...and 5 more figures

$R^3$-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL

TL;DR

Abstract

$R^3$-NL2GQL: A Model Coordination and Knowledge Graph Alignment Approach for NL2GQL

Authors

TL;DR

Abstract

Table of Contents

Figures (10)