Table of Contents
Fetching ...

A Plug-and-Play Natural Language Rewriter for Natural Language to SQL

Peixian Ma, Boyan Li, Runzhi Jiang, Ju Fan, Nan Tang, Yuyu Luo

TL;DR

NL2SQL systems struggle when user queries are unclear due to limited schema knowledge and memory biases. The paper introduces REWRITER, a plug-and-play, multi-agent framework with Checker, Reflector, Rewriter and a Memory module that rewrites flawed NL queries using database content to align with the underlying DB. It treats NL2SQL models as black boxes, enabling compatibility with diverse systems; it uses self-reflection to learn rewriting strategies without large labeled datasets. Empirical evaluation on Spider and BIRD shows consistent improvements in execution accuracy and exact-match accuracy, demonstrating improved robustness and reduced hallucinations in downstream SQL generation. The work offers a practical, adaptable approach to narrowing the NL-to-SQL gap in real-world deployments.

Abstract

Existing Natural Language to SQL (NL2SQL) solutions have made significant advancements, yet challenges persist in interpreting and translating NL queries, primarily due to users' limited understanding of database schemas or memory biases toward specific table or column values. These challenges often result in incorrect NL2SQL translations. To address these issues, we propose REWRITER, a plug-and-play module designed to enhance NL2SQL systems by automatically rewriting ambiguous or flawed NL queries. By incorporating database knowledge and content (e.g., column values and foreign keys), REWRITER reduces errors caused by flawed NL inputs and improves SQL generation accuracy. Our REWRITER treats NL2SQL models as black boxes, ensuring compatibility with various NL2SQL methods, including agent-based and rule-based NL2SQL solutions. REWRITER comprises three key components: Checker, Reflector, and Rewriter. The Checker identifies flawed NL queries by assessing the correctness of the generated SQL, minimizing unnecessary rewriting and potential hallucinations. The Reflector analyzes and accumulates experience to identify issues in NL queries, while the Rewriter revises the queries based on Reflector's feedback. Extensive experiments on the Spider and BIRD benchmarks demonstrate that REWRITER consistently enhances downstream models, achieving average improvements of 1.6% and 2.0% in execution accuracy, respectively.

A Plug-and-Play Natural Language Rewriter for Natural Language to SQL

TL;DR

NL2SQL systems struggle when user queries are unclear due to limited schema knowledge and memory biases. The paper introduces REWRITER, a plug-and-play, multi-agent framework with Checker, Reflector, Rewriter and a Memory module that rewrites flawed NL queries using database content to align with the underlying DB. It treats NL2SQL models as black boxes, enabling compatibility with diverse systems; it uses self-reflection to learn rewriting strategies without large labeled datasets. Empirical evaluation on Spider and BIRD shows consistent improvements in execution accuracy and exact-match accuracy, demonstrating improved robustness and reduced hallucinations in downstream SQL generation. The work offers a practical, adaptable approach to narrowing the NL-to-SQL gap in real-world deployments.

Abstract

Existing Natural Language to SQL (NL2SQL) solutions have made significant advancements, yet challenges persist in interpreting and translating NL queries, primarily due to users' limited understanding of database schemas or memory biases toward specific table or column values. These challenges often result in incorrect NL2SQL translations. To address these issues, we propose REWRITER, a plug-and-play module designed to enhance NL2SQL systems by automatically rewriting ambiguous or flawed NL queries. By incorporating database knowledge and content (e.g., column values and foreign keys), REWRITER reduces errors caused by flawed NL inputs and improves SQL generation accuracy. Our REWRITER treats NL2SQL models as black boxes, ensuring compatibility with various NL2SQL methods, including agent-based and rule-based NL2SQL solutions. REWRITER comprises three key components: Checker, Reflector, and Rewriter. The Checker identifies flawed NL queries by assessing the correctness of the generated SQL, minimizing unnecessary rewriting and potential hallucinations. The Reflector analyzes and accumulates experience to identify issues in NL queries, while the Rewriter revises the queries based on Reflector's feedback. Extensive experiments on the Spider and BIRD benchmarks demonstrate that REWRITER consistently enhances downstream models, achieving average improvements of 1.6% and 2.0% in execution accuracy, respectively.

Paper Structure

This paper contains 30 sections, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Demonstration of our work. The proposed plug-and-play Rewriter clearly indicates the relationships of foreign keys between tables in the rewritten NL, thereby avoiding the generation of incorrect SQL statements due to the gap between unclear user intent and the DB.
  • Figure 2: An overview of the proposed Rewriter framework, which comprises the following components: (i) Checker, which determines whether NL matches the generated SQL; (ii) Reflector, which analyzes the flawed NL and gives the rewriting reflection with the reference of DB; (iii) Rewriter, which rewrites the flawed NL under the guidance of reflection. In addition, a task-specific Memory module provides information exchange and storage services for these agents.
  • Figure 3: Demonstration of self-reflection mechanism. In the rewriting process, the experiences with the highest weights will be loaded into the rules. Subsequently, the Reflector updates the weights of the experience, which is applied in the detailed reflection based on the checker's feedback on the new results.
  • Figure 4: Execution accuracy and token efficiency of Rewriter in single-round and multi-round rewriting on Spider-dev set.
  • Figure 5: Effectiveness of Checker. Token efficiency vs. Precision on Spider-dev set.