Table of Contents
Fetching ...

ErrorLLM: Modeling SQL Errors for Text-to-SQL Refinement

Zijin Hong, Hao Chen, Zheng Yuan, Qinggang Zhang, Luyao Zhuang, Qing Liao, Feiran Huang, Yangqiu Song, Xiao Huang

TL;DR

ErrorLLM is proposed, a framework that explicitly models text-to-SQL Errors within a dedicated LLM for text-to-SQL refinement, and achieves the most significant improvements over backbone initial generation.

Abstract

Despite the remarkable performance of large language models (LLMs) in text-to-SQL (SQL generation), correctly producing SQL queries remains challenging during initial generation. The SQL refinement task is subsequently introduced to correct syntactic and semantic errors in generated SQL queries. However, existing paradigms face two major limitations: (i) self-debugging becomes increasingly ineffective as modern LLMs rarely produce explicit execution errors that can trigger debugging signals; (ii) self-correction exhibits low detection precision due to the lack of explicit error modeling grounded in the question and schema, and suffers from severe hallucination that frequently corrupts correct SQLs. In this paper, we propose ErrorLLM, a framework that explicitly models text-to-SQL Errors within a dedicated LLM for text-to-SQL refinement. Specifically, we represent the user question and database schema as structural features, employ static detection to identify execution failures and surface mismatches, and extend ErrorLLM's semantic space with dedicated error tokens that capture categorized implicit semantic error types. Through a well-designed training strategy, we explicitly model these errors with structural representations, enabling the LLM to detect complex implicit errors by predicting dedicated error tokens. Guided by the detected errors, we perform error-guided refinement on the SQL structure by prompting LLMs. Extensive experiments demonstrate that ErrorLLM achieves the most significant improvements over backbone initial generation. Further analysis reveals that detection quality directly determines refinement effectiveness, and ErrorLLM addresses both sides by high detection F1 score while maintain refinement effectiveness.

ErrorLLM: Modeling SQL Errors for Text-to-SQL Refinement

TL;DR

ErrorLLM is proposed, a framework that explicitly models text-to-SQL Errors within a dedicated LLM for text-to-SQL refinement, and achieves the most significant improvements over backbone initial generation.

Abstract

Despite the remarkable performance of large language models (LLMs) in text-to-SQL (SQL generation), correctly producing SQL queries remains challenging during initial generation. The SQL refinement task is subsequently introduced to correct syntactic and semantic errors in generated SQL queries. However, existing paradigms face two major limitations: (i) self-debugging becomes increasingly ineffective as modern LLMs rarely produce explicit execution errors that can trigger debugging signals; (ii) self-correction exhibits low detection precision due to the lack of explicit error modeling grounded in the question and schema, and suffers from severe hallucination that frequently corrupts correct SQLs. In this paper, we propose ErrorLLM, a framework that explicitly models text-to-SQL Errors within a dedicated LLM for text-to-SQL refinement. Specifically, we represent the user question and database schema as structural features, employ static detection to identify execution failures and surface mismatches, and extend ErrorLLM's semantic space with dedicated error tokens that capture categorized implicit semantic error types. Through a well-designed training strategy, we explicitly model these errors with structural representations, enabling the LLM to detect complex implicit errors by predicting dedicated error tokens. Guided by the detected errors, we perform error-guided refinement on the SQL structure by prompting LLMs. Extensive experiments demonstrate that ErrorLLM achieves the most significant improvements over backbone initial generation. Further analysis reveals that detection quality directly determines refinement effectiveness, and ErrorLLM addresses both sides by high detection F1 score while maintain refinement effectiveness.
Paper Structure (31 sections, 47 equations, 8 figures, 10 tables, 2 algorithms)

This paper contains 31 sections, 47 equations, 8 figures, 10 tables, 2 algorithms.

Figures (8)

  • Figure 1: Different paradigms of text-to-SQL refinement. Self-debugging misses the incorrect SQL since there is no execution failure. Self-correction conducts correction on both SQLs but cannot successfully fix the error.
  • Figure 2: An illustration of the modeling the SQL error in ErrorLLM and the overview of SQL error detection and error-guided text-to-SQL refinement process. The example is selected from the BIRDli2023bird development set, and refined by our workflow.
  • Figure 3: Results of text-to-SQL execution accuracy (EX) (%) based on GPT-4o generated SQLs on Spider variants. The values with uparrow indicate improvements of EX.
  • Figure 4: Comparison of text-to-SQL refinement methods by detection depth on GPT-4o generated SQLs from the BIRD development set. $^\diamondsuit$ Self-correction pourreza2023dinsql uses the same number of TP as ErrorLLM for direct comparison.
  • Figure 5: SQL error detection results of ErrorLLM and proprietary LLM baselines on NL2SQL-Bugsliu2025nl2sqlbug, evaluated per error category using type-specific accuracy (TSA).
  • ...and 3 more figures