Table of Contents
Fetching ...

Dial: A Knowledge-Grounded Dialect-Specific NL2SQL System

Xiang Zhang, Hongming Xu, Le Zhou, Wei Zhou, Xuanhe Zhou, Guoliang Li, Yuyu Luo, Changdong Liu, Guorun Chen, Jiang Liao, Fan Wu

TL;DR

Dial is presented, a knowledge-grounded framework for dialect-specific NL2SQL methods that converts natural language into a dialect-aware logical query plan via operator-level intent decomposition and divergence-aware specification and an execution-driven debugging and semantic verification loop that separates syntactic recovery from logic auditing to prevent semantic drift.

Abstract

Enterprises commonly deploy heterogeneous database systems, each of which owns a distinct SQL dialect with different syntax rules, built-in functions, and execution constraints. However, most existing NL2SQL methods assume a single dialect (e.g., SQLite) and struggle to produce queries that are both semantically correct and executable on target engines. Prompt-based approaches tightly couple intent reasoning with dialect syntax, rule-based translators often degrade native operators into generic constructs, and multi-dialect fine-tuning suffers from cross-dialect interference. In this paper, we present Dial, a knowledge-grounded framework for dialect-specific NL2SQL. Dial introduces: (1) a Dialect-Aware Logical Query Planning module that converts natural language into a dialect-aware logical query plan via operator-level intent decomposition and divergence-aware specification; (2) HINT-KB, a hierarchical intent-aware knowledge base that organizes dialect knowledge into (i) a canonical syntax reference, (ii) a declarative function repository, and (iii) a procedural constraint repository; and (3) an execution-driven debugging and semantic verification loop that separates syntactic recovery from logic auditing to prevent semantic drift. We construct DS-NL2SQL, a benchmark covering six major database systems with 2,218 dialect-specific test cases. Experimental results show that Dial consistently improves translation accuracy by 10.25% and dialect feature coverage by 15.77% over state-of-the-art baselines. The code is at https://github.com/weAIDB/Dial.

Dial: A Knowledge-Grounded Dialect-Specific NL2SQL System

TL;DR

Dial is presented, a knowledge-grounded framework for dialect-specific NL2SQL methods that converts natural language into a dialect-aware logical query plan via operator-level intent decomposition and divergence-aware specification and an execution-driven debugging and semantic verification loop that separates syntactic recovery from logic auditing to prevent semantic drift.

Abstract

Enterprises commonly deploy heterogeneous database systems, each of which owns a distinct SQL dialect with different syntax rules, built-in functions, and execution constraints. However, most existing NL2SQL methods assume a single dialect (e.g., SQLite) and struggle to produce queries that are both semantically correct and executable on target engines. Prompt-based approaches tightly couple intent reasoning with dialect syntax, rule-based translators often degrade native operators into generic constructs, and multi-dialect fine-tuning suffers from cross-dialect interference. In this paper, we present Dial, a knowledge-grounded framework for dialect-specific NL2SQL. Dial introduces: (1) a Dialect-Aware Logical Query Planning module that converts natural language into a dialect-aware logical query plan via operator-level intent decomposition and divergence-aware specification; (2) HINT-KB, a hierarchical intent-aware knowledge base that organizes dialect knowledge into (i) a canonical syntax reference, (ii) a declarative function repository, and (iii) a procedural constraint repository; and (3) an execution-driven debugging and semantic verification loop that separates syntactic recovery from logic auditing to prevent semantic drift. We construct DS-NL2SQL, a benchmark covering six major database systems with 2,218 dialect-specific test cases. Experimental results show that Dial consistently improves translation accuracy by 10.25% and dialect feature coverage by 15.77% over state-of-the-art baselines. The code is at https://github.com/weAIDB/Dial.
Paper Structure (27 sections, 6 figures, 6 tables)

This paper contains 27 sections, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Dialect-Specific NL2SQL Failures -- Case 1: Oracle rejects MySQL-style LIMIT. Case 2: Oracle's CONCAT accepts only two arguments, unlike MySQL's variadic version. Case 3: PostgreSQL enforces ORDER BY under DISTINCT references selected expressions.
  • Figure 2: Dialect-Specific Error Analysis. -- Top: total error rate; Bottom: non-executable rate. Errors are grouped into Unsupported Syntax (U), Incorrect Usage (M), and Implicit Constraints (I). The row gap reflects semantic drift (executable but incorrect).
  • Figure 3: System Overview of Dial.
  • Figure 4: Dialect Knowledge Base Construction.
  • Figure 5: Dialect-Aware Logical Query Planning.
  • ...and 1 more figures