Table of Contents
Fetching ...

SQL-to-Schema Enhances Schema Linking in Text-to-SQL

Sun Yang, Qiong Su, Zhishuai Li, Ziyue Li, Hangyu Mao, Chenxi Liu, Rui Zhao

TL;DR

This paper introduces SQL-to-Schema, a two-step paradigm for Text-to-SQL that first generates an initial SQL using the full database schema and then extracts a concise linking schema from that SQL. By defining table-recall@4 and leveraging modular prompting with Codellama-34B and GPT-4, the approach improves schema linking and end-to-end SQL generation on the Spider dataset, achieving state-of-the-art linking quality and competitive execution accuracy without extensive fine-tuning. The method demonstrates that extracting linking schemas from initial SQL and iterative generation can yield substantial gains with lightweight prompts, highlighting the potential of prompt-based, schema-aware strategies in cross-domain Text-to-SQL tasks.

Abstract

In sophisticated existing Text-to-SQL methods exhibit errors in various proportions, including schema-linking errors (incorrect columns, tables, or extra columns), join errors, nested errors, and group-by errors. Consequently, there is a critical need to filter out unnecessary tables and columns, directing the language models attention to relevant tables and columns with schema-linking, to reduce errors during SQL generation. Previous approaches have involved sorting tables and columns based on their relevance to the question, selecting the top-ranked ones for sorting, or directly identifying the necessary tables and columns for SQL generation. However, these methods face challenges such as lengthy model training times, high consumption of expensive GPT-4 tokens in few-shot prompts, or suboptimal performance in schema linking. Therefore, we propose an inventive schema linking method in two steps: Firstly, generate an initial SQL query by utilizing the complete database schema. Subsequently, extract tables and columns from the initial SQL query to create a concise schema. Using CodeLlama-34B, when comparing the schemas obtained by mainstream methods with ours for SQL generation, our schema performs optimally. Leveraging GPT4, our SQL generation method achieved results that are comparable to mainstream Text-to-SQL methods on the Spider dataset.

SQL-to-Schema Enhances Schema Linking in Text-to-SQL

TL;DR

This paper introduces SQL-to-Schema, a two-step paradigm for Text-to-SQL that first generates an initial SQL using the full database schema and then extracts a concise linking schema from that SQL. By defining table-recall@4 and leveraging modular prompting with Codellama-34B and GPT-4, the approach improves schema linking and end-to-end SQL generation on the Spider dataset, achieving state-of-the-art linking quality and competitive execution accuracy without extensive fine-tuning. The method demonstrates that extracting linking schemas from initial SQL and iterative generation can yield substantial gains with lightweight prompts, highlighting the potential of prompt-based, schema-aware strategies in cross-domain Text-to-SQL tasks.

Abstract

In sophisticated existing Text-to-SQL methods exhibit errors in various proportions, including schema-linking errors (incorrect columns, tables, or extra columns), join errors, nested errors, and group-by errors. Consequently, there is a critical need to filter out unnecessary tables and columns, directing the language models attention to relevant tables and columns with schema-linking, to reduce errors during SQL generation. Previous approaches have involved sorting tables and columns based on their relevance to the question, selecting the top-ranked ones for sorting, or directly identifying the necessary tables and columns for SQL generation. However, these methods face challenges such as lengthy model training times, high consumption of expensive GPT-4 tokens in few-shot prompts, or suboptimal performance in schema linking. Therefore, we propose an inventive schema linking method in two steps: Firstly, generate an initial SQL query by utilizing the complete database schema. Subsequently, extract tables and columns from the initial SQL query to create a concise schema. Using CodeLlama-34B, when comparing the schemas obtained by mainstream methods with ours for SQL generation, our schema performs optimally. Leveraging GPT4, our SQL generation method achieved results that are comparable to mainstream Text-to-SQL methods on the Spider dataset.
Paper Structure (12 sections, 3 figures, 4 tables)

This paper contains 12 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The complete example of SQL-to-Schema
  • Figure 2: The complete schema linking and Text-to-SQL algorithm framework.
  • Figure 3: The single schema prompt error examples.