Table of Contents
Fetching ...

DAC: Decomposed Automation Correction for Text-to-SQL

Dingzirui Wang, Longxu Dou, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

TL;DR

This paper tackles the challenge of correcting text-to-SQL outputs produced by LLMs, where direct correction is difficult for models to detect mistakes. It introduces Decomposed Automation Correction (DAC), which splits correction into two sub-tasks—entity linking and skeleton parsing—and uses their inconsistencies with an initial SQL as feedback to generate improved queries. Across Spider, Bird, and KaggleDBQA, DAC yields an average improvement of $3.7\%$ over baselines and shows notable gains across multiple models, with ablations confirming that both sub-tasks contribute to accuracy, while skeleton parsing is often the bottleneck for smaller models. The approach demonstrates that task decomposition provides a robust pathway for error correction in text-to-SQL, with strong implications for practical deployment and future enhancement via oracle-based evaluations and harder query sets.

Abstract

Text-to-SQL is an important task that helps people obtain information from databases by automatically generating SQL queries. Considering the brilliant performance, approaches based on Large Language Models (LLMs) become the mainstream for text-to-SQL. Among these approaches, automated correction is an effective approach that further enhances performance by correcting the mistakes in the generated results. The existing correction methods require LLMs to directly correct with generated SQL, while previous research shows that LLMs do not know how to detect mistakes, leading to poor performance. Therefore, in this paper, we propose to employ the decomposed correction to enhance text-to-SQL performance. We first demonstrate that decomposed correction outperforms direct correction since detecting and fixing mistakes with the results of the decomposed sub-tasks is easier than with SQL. Based on this analysis, we introduce Decomposed Automation Correction (DAC), which corrects SQL by decomposing text-to-SQL into entity linking and skeleton parsing. DAC first generates the entity and skeleton corresponding to the question and then compares the differences between the initial SQL and the generated entities and skeleton as feedback for correction. Experimental results show that our method improves performance by $3.7\%$ on average of Spider, Bird, and KaggleDBQA compared with the baseline method, demonstrating the effectiveness of DAC.

DAC: Decomposed Automation Correction for Text-to-SQL

TL;DR

This paper tackles the challenge of correcting text-to-SQL outputs produced by LLMs, where direct correction is difficult for models to detect mistakes. It introduces Decomposed Automation Correction (DAC), which splits correction into two sub-tasks—entity linking and skeleton parsing—and uses their inconsistencies with an initial SQL as feedback to generate improved queries. Across Spider, Bird, and KaggleDBQA, DAC yields an average improvement of over baselines and shows notable gains across multiple models, with ablations confirming that both sub-tasks contribute to accuracy, while skeleton parsing is often the bottleneck for smaller models. The approach demonstrates that task decomposition provides a robust pathway for error correction in text-to-SQL, with strong implications for practical deployment and future enhancement via oracle-based evaluations and harder query sets.

Abstract

Text-to-SQL is an important task that helps people obtain information from databases by automatically generating SQL queries. Considering the brilliant performance, approaches based on Large Language Models (LLMs) become the mainstream for text-to-SQL. Among these approaches, automated correction is an effective approach that further enhances performance by correcting the mistakes in the generated results. The existing correction methods require LLMs to directly correct with generated SQL, while previous research shows that LLMs do not know how to detect mistakes, leading to poor performance. Therefore, in this paper, we propose to employ the decomposed correction to enhance text-to-SQL performance. We first demonstrate that decomposed correction outperforms direct correction since detecting and fixing mistakes with the results of the decomposed sub-tasks is easier than with SQL. Based on this analysis, we introduce Decomposed Automation Correction (DAC), which corrects SQL by decomposing text-to-SQL into entity linking and skeleton parsing. DAC first generates the entity and skeleton corresponding to the question and then compares the differences between the initial SQL and the generated entities and skeleton as feedback for correction. Experimental results show that our method improves performance by on average of Spider, Bird, and KaggleDBQA compared with the baseline method, demonstrating the effectiveness of DAC.
Paper Structure (34 sections, 5 figures, 13 tables)

This paper contains 34 sections, 5 figures, 13 tables.

Figures (5)

  • Figure 1: The comparison between direct correction and decomposed correction. Direct correction shows poor performance since LLMs do not know how to detect mistakes. Decomposed correction brings better performance since it is easier to detect mistakes from the decomposed tasks of entity linking and skeleton parsing.
  • Figure 2: The illustration of our discussion, taking the question "Order the stock idx with earnings more than 5,000" as an example. About the direct correction (left part), it is challenging for LLMs to pinpoint specific mistakes. After decomposing the SQL into sub-tasks (right part), it is easier for LLMs to identify and correct the mistakes.
  • Figure 3: The pipeline of DAC, which consists of five steps: (i) SQL Generation: Generate the initial SQL; (ii) Entity Linking: Detect the question-related entities; (iii) Skeleton Parsing: Generate the SQL skeleton of the question; (iv) Comparison: Determine the inconsistencies between the initial SQL and the linked entities and parsed skeletons as feedback; (v) Correction: Correct the SQL with the comparison feedback.
  • Figure 4: The case study with and without DAC of the Spider dev set using Llama3-70b. The mistake in the SQL is annotated with bold.
  • Figure 5: The error analysis with and without DAC. #Example denotes the average error examples across both models (Llama3, Deepseek-Coder) and all three datasets (Spider, Bird and KaggleDBQA). The table and column errors denote that the generated SQL does not have corresponding tables and columns of the correct SQL. The skeleton error denotes not having the correct skeleton. The execution error denotes that the generated SQL is not able to be executed. There could be one example that has multiple errors.