Table of Contents
Fetching ...

EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

Tianshu Zhang, Kun Qian, Siddhartha Sahai, Yuan Tian, Shaddy Garg, Huan Sun, Yunyao Li

TL;DR

EvoSchema is presented, a comprehensive benchmark designed to assess and enhance the robustness of text-to-SQL systems under real-world schema changes, and inspires the development of more resilient text-to-SQL systems, in terms of both model training and database design.

Abstract

Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema evolution often leads to performance degradation for models trained on static schemas. Existing work either mainly focuses on simply paraphrasing some syntactic or semantic mappings among NLQ, DB and SQL, or lacks a comprehensive and controllable way to investigate the model robustness issue under the schema evolution, which is insufficient when facing the increasingly complex and rich database schema changes in reality, especially in the LLM era. To address the challenges posed by schema evolution, we present EvoSchema, a comprehensive benchmark designed to assess and enhance the robustness of text-to-SQL systems under real-world schema changes. EvoSchema introduces a novel schema evolution taxonomy, encompassing ten perturbation types across columnlevel and table-level modifications, systematically simulating the dynamic nature of database schemas. Through EvoSchema, we conduct an in-depth evaluation spanning different open source and closed-source LLMs, revealing that table-level perturbations have a significantly greater impact on model performance compared to column-level changes. Furthermore, EvoSchema inspires the development of more resilient text-to-SQL systems, in terms of both model training and database design. The models trained on EvoSchema's diverse schema designs can force the model to distinguish the schema difference for the same questions to avoid learning spurious patterns, which demonstrate remarkable robustness compared to those trained on unperturbed data on average. This benchmark offers valuable insights into model behavior and a path forward for designing systems capable of thriving in dynamic, real-world environments.

EvoSchema: Towards Text-to-SQL Robustness Against Schema Evolution

TL;DR

EvoSchema is presented, a comprehensive benchmark designed to assess and enhance the robustness of text-to-SQL systems under real-world schema changes, and inspires the development of more resilient text-to-SQL systems, in terms of both model training and database design.

Abstract

Neural text-to-SQL models, which translate natural language questions (NLQs) into SQL queries given a database schema, have achieved remarkable performance. However, database schemas frequently evolve to meet new requirements. Such schema evolution often leads to performance degradation for models trained on static schemas. Existing work either mainly focuses on simply paraphrasing some syntactic or semantic mappings among NLQ, DB and SQL, or lacks a comprehensive and controllable way to investigate the model robustness issue under the schema evolution, which is insufficient when facing the increasingly complex and rich database schema changes in reality, especially in the LLM era. To address the challenges posed by schema evolution, we present EvoSchema, a comprehensive benchmark designed to assess and enhance the robustness of text-to-SQL systems under real-world schema changes. EvoSchema introduces a novel schema evolution taxonomy, encompassing ten perturbation types across columnlevel and table-level modifications, systematically simulating the dynamic nature of database schemas. Through EvoSchema, we conduct an in-depth evaluation spanning different open source and closed-source LLMs, revealing that table-level perturbations have a significantly greater impact on model performance compared to column-level changes. Furthermore, EvoSchema inspires the development of more resilient text-to-SQL systems, in terms of both model training and database design. The models trained on EvoSchema's diverse schema designs can force the model to distinguish the schema difference for the same questions to avoid learning spurious patterns, which demonstrate remarkable robustness compared to those trained on unperturbed data on average. This benchmark offers valuable insights into model behavior and a path forward for designing systems capable of thriving in dynamic, real-world environments.
Paper Structure (25 sections, 3 figures, 9 tables)

This paper contains 25 sections, 3 figures, 9 tables.

Figures (3)

  • Figure 1: The left (a) is the overview of the framework to collect EvoSchema dataset. The top right (b) is a column-level schema evolution example; the bottom right (c) is a table-level schema evolution example.
  • Figure 2: An overview of different perturbation types of EvoSchema. The top is an unperturbed example in BIRD bird; the middle is the column-level perturbation; the bottom is the table-level perturbation. "Remove Col in SQL": remove columns that appear in gold SQL; "Remove Tables": the relevant tables appear in gold SQL are removed. Thus there is no gold SQL for these two cases. Note we don't illustrate "Merge Columns" in the figure as this example is not suitable for applying merging column changes.
  • Figure 3: This figure shows two examples of our data collection procedure of EvoSchema. The top (a) is a "rename columns" data collection procedure; the bottom (b) is a "split tables" data collection procedure. The blue box indicates prompting GPT models for the generation. "</>" means programmatically processing the data.