SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

Ke Shen; Mayank Kejriwal

SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

Ke Shen, Mayank Kejriwal

TL;DR

This work introduces SelECT-SQL, a novel in-context learning solution that uses an algorithmic combination of chain-of-thought prompting, self-correction, and ensemble methods to yield a new state-of-the-art result on challenging Text-to-SQL benchmarks.

Abstract

In recent years,Text-to-SQL, the problem of automatically converting questions posed in natural language to formal SQL queries, has emerged as an important problem at the intersection of natural language processing and data management research. Large language models (LLMs) have delivered impressive performance when used in an off-the-shelf performance, but still fall significantly short of expected expert-level performance. Errors are especially probable when a nuanced understanding is needed of database schemas, questions, and SQL clauses to do proper Text-to-SQL conversion. We introduce SelECT-SQL, a novel in-context learning solution that uses an algorithmic combination of chain-of-thought (CoT) prompting, self-correction, and ensemble methods to yield a new state-of-the-art result on challenging Text-to-SQL benchmarks. Specifically, when configured using GPT-3.5-Turbo as the base LLM, SelECT-SQL achieves 84.2% execution accuracy on the Spider leaderboard's development set, exceeding both the best results of other baseline GPT-3.5-Turbo-based solutions (81.1%), and the peak performance (83.5%) of the GPT-4 result reported on the leaderboard.

SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

TL;DR

Abstract

Paper Structure (29 sections, 3 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 29 sections, 3 equations, 8 figures, 2 tables, 1 algorithm.

Introduction
SelECT-SQL
Chain-of-thought Prompting
Structure-synthesis CoT
Modular-synthesis CoT
Self-correction
Ensemble refinement
Experiments
Datasets
Metrics
Prompting
Baselines
Results
Overall execution accuracy
Ablation Study
...and 14 more sections

Figures (8)

Figure 1: Examples of structure-synthesis (SS) and modular-synthesis (MS) CoT prompting.
Figure 2: Illustration of the Self-Correction Component.
Figure 3: Execution accuracy of baselines and GPT-3.5-Turbo implemented with SelECT-SQL on Spider dev set. The DAIL-SQL baseline method, highlighted with a red border, employs the GPT-4 engine, while other baselines utilize zero-shot or few-shot prompted GPT-3.5-Turbo or fine-tuned T5. The left side shows the overall accuracy of both our methods and the baselines, while the right side presents the detailed performance of SelECT-SQL across different databases in Spider-dev.
Figure 4: Error Distribution in ChatGPT-Generated Queries on the Spider-dev set. We report the partial clause component matching accuracy on the left side as a reference. Note that the partial accuracy is loosely correlated with the error portion shown in the distribution pie chart, as the missing matching of partial clauses may not result in an incorrect query. There may exist several correct queries that can solve the same question.
Figure 5: Objective paraphrasing prompt for SS CoT: Above, we provide the schema and question representation along with the detailed prompting message. An example response is shown on the right.
...and 3 more figures

SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

TL;DR

Abstract

SelECT-SQL: Self-correcting ensemble Chain-of-Thought for Text-to-SQL

Authors

TL;DR

Abstract

Table of Contents

Figures (8)