Table of Contents
Fetching ...

MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL

Wenxuan Xie, Gaochen Wu, Bowen Zhou

Abstract

Recent In-Context Learning based methods have achieved remarkable success in Text-to-SQL task. However, there is still a large gap between the performance of these models and human performance on datasets with complex database schema and difficult questions, such as BIRD. Besides, existing work has neglected to supervise intermediate steps when solving questions iteratively with question decomposition methods, and the schema linking methods used in these works are very rudimentary. To address these issues, we propose MAG-SQL, a multi-agent generative approach with soft schema linking and iterative Sub-SQL refinement. In our framework, an entity-based method with tables' summary is used to select the columns in database, and a novel targets-conditions decomposition method is introduced to decompose those complex questions. Additionally, we build a iterative generating module which includes a Sub-SQL Generator and Sub-SQL Refiner, introducing external oversight for each step of generation. Through a series of ablation studies, the effectiveness of each agent in our framework has been demonstrated. When evaluated on the BIRD benchmark with GPT-4, MAG-SQL achieves an execution accuracy of 61.08%, compared to the baseline accuracy of 46.35% for vanilla GPT-4 and the baseline accuracy of 57.56% for MAC-SQL. Besides, our approach makes similar progress on Spider. The codes are available at https://github.com/LancelotXWX/MAG-SQL.

MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL

Abstract

Recent In-Context Learning based methods have achieved remarkable success in Text-to-SQL task. However, there is still a large gap between the performance of these models and human performance on datasets with complex database schema and difficult questions, such as BIRD. Besides, existing work has neglected to supervise intermediate steps when solving questions iteratively with question decomposition methods, and the schema linking methods used in these works are very rudimentary. To address these issues, we propose MAG-SQL, a multi-agent generative approach with soft schema linking and iterative Sub-SQL refinement. In our framework, an entity-based method with tables' summary is used to select the columns in database, and a novel targets-conditions decomposition method is introduced to decompose those complex questions. Additionally, we build a iterative generating module which includes a Sub-SQL Generator and Sub-SQL Refiner, introducing external oversight for each step of generation. Through a series of ablation studies, the effectiveness of each agent in our framework has been demonstrated. When evaluated on the BIRD benchmark with GPT-4, MAG-SQL achieves an execution accuracy of 61.08%, compared to the baseline accuracy of 46.35% for vanilla GPT-4 and the baseline accuracy of 57.56% for MAC-SQL. Besides, our approach makes similar progress on Spider. The codes are available at https://github.com/LancelotXWX/MAG-SQL.
Paper Structure (41 sections, 2 equations, 14 figures, 4 tables)

This paper contains 41 sections, 2 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: A hard example in BIRD. Each data in the BIRD dataset consists of a natural language questions, the database schema, related external knowledge, and gold SQL. The difficulties corresponding to these four parts are listed above.
  • Figure 2: The overview of our MAG-SQL workflow which consists of four agents: (i)Soft Schema Linker, which picks out related schema and construct database schema prompt based on soft selection, (ii)Targets-Conditions Decomposer, which decomposes the question at a fine-grained level, (iii)Sub-SQL Generator, which predict the next Sub-SQL based on the previous Sub-SQL each time, and (iv)Sub-SQL Refiner, which correct the wrong SQL based on the self-refine ability of LLMs.
  • Figure 3: Output Format of the Summary
  • Figure 4: Output Format of Linked Schema
  • Figure 5: Example of Soft Schema
  • ...and 9 more figures