Table of Contents
Fetching ...

Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation

Yuan Tian, Daniel Lee, Fei Wu, Tung Mai, Kun Qian, Siddhartha Sahai, Tianyi Zhang, Yunyao Li

TL;DR

Domain-shift and data scarcity hinder practical deployment of text-to-SQL systems on new schemas. The authors introduce SQLsynth, an interactive, human-in-the-loop annotation system that combines a PCFG-based SQL sampler, LLM-assisted NL generation, step-by-step explanations, alignment-based error repair, and dataset-diversity analysis, all wrapped in an extensible UI with schema visualization. A within-subjects study with 12 participants shows SQLsynth dramatically increases annotation throughput while reducing errors, improving naturalness, and enhancing diversity compared with manual annotation or a ChatGPT-only workflow. The work demonstrates that structured human-LLM collaboration can produce high-quality, schema-specific NL-to-SQL datasets efficiently, enabling focused model fine-tuning and robust domain evaluation for real-world deployments.

Abstract

Text-to-SQL models, which parse natural language (NL) questions to executable SQL queries, are increasingly adopted in real-world applications. However, deploying such models in the real world often requires adapting them to the highly specialized database schemas used in specific applications. We find that existing text-to-SQL models experience significant performance drops when applied to new schemas, primarily due to the lack of domain-specific data for fine-tuning. This data scarcity also limits the ability to effectively evaluate model performance in new domains. Continuously obtaining high-quality text-to-SQL data for evolving schemas is prohibitively expensive in real-world scenarios. To bridge this gap, we propose SQLsynth, a human-in-the-loop text-to-SQL data annotation system. SQLsynth streamlines the creation of high-quality text-to-SQL datasets through human-LLM collaboration in a structured workflow. A within-subjects user study comparing SQLsynth with manual annotation and ChatGPT shows that SQLsynth significantly accelerates text-to-SQL data annotation, reduces cognitive load, and produces datasets that are more accurate, natural, and diverse. Our code is available at https://github.com/magic-YuanTian/SQLsynth.

Text-to-SQL Domain Adaptation via Human-LLM Collaborative Data Annotation

TL;DR

Domain-shift and data scarcity hinder practical deployment of text-to-SQL systems on new schemas. The authors introduce SQLsynth, an interactive, human-in-the-loop annotation system that combines a PCFG-based SQL sampler, LLM-assisted NL generation, step-by-step explanations, alignment-based error repair, and dataset-diversity analysis, all wrapped in an extensible UI with schema visualization. A within-subjects study with 12 participants shows SQLsynth dramatically increases annotation throughput while reducing errors, improving naturalness, and enhancing diversity compared with manual annotation or a ChatGPT-only workflow. The work demonstrates that structured human-LLM collaboration can produce high-quality, schema-specific NL-to-SQL datasets efficiently, enabling focused model fine-tuning and robust domain evaluation for real-world deployments.

Abstract

Text-to-SQL models, which parse natural language (NL) questions to executable SQL queries, are increasingly adopted in real-world applications. However, deploying such models in the real world often requires adapting them to the highly specialized database schemas used in specific applications. We find that existing text-to-SQL models experience significant performance drops when applied to new schemas, primarily due to the lack of domain-specific data for fine-tuning. This data scarcity also limits the ability to effectively evaluate model performance in new domains. Continuously obtaining high-quality text-to-SQL data for evolving schemas is prohibitively expensive in real-world scenarios. To bridge this gap, we propose SQLsynth, a human-in-the-loop text-to-SQL data annotation system. SQLsynth streamlines the creation of high-quality text-to-SQL datasets through human-LLM collaboration in a structured workflow. A within-subjects user study comparing SQLsynth with manual annotation and ChatGPT shows that SQLsynth significantly accelerates text-to-SQL data annotation, reduces cognitive load, and produces datasets that are more accurate, natural, and diverse. Our code is available at https://github.com/magic-YuanTian/SQLsynth.

Paper Structure

This paper contains 47 sections, 17 figures, 7 tables.

Figures (17)

  • Figure 1: Illustration of traditional text-to-SQL annotation v.s. using SQLsynth. Green arrows indicate dataset creation steps, gray arrows represent supportive data flows, and blue dashed arrows show user interactions with the interface. SQLsynth includes three interactive features in the blue box: schema visualization, error detection and repair, and diversity analysis. Annotators can leverage these features to efficiently control the data annotation process.
  • Figure 2: The user interface for schema visualization. Each node represents a database table, while each cell represents a column in the table. The blue cell marked with "PK" represents the primary key. The dashed gray edge represents a foreign key reference relationship between two tables. Users can (a) add a new table, (b) add a new column, (c) add a reference relationship, (d) define the data type for a column, (e) add a description for columns and tables, and (f) remove, upload, or download the database schema.
  • Figure 3: The user interface for database population. Users can (a) populate the database with a specified number of records, (b) switch table views, and (c) upload or download synthesized records.
  • Figure 4: The user interface for data generation, error detection, and repair. Users can (a) generate a suggested SQL query, (b) check the query result, (c) read the step-by-step explanation in natural language, (d) generate the corresponding suggested NL question, (e) check similar gold data, (f) hover on each step to highlight the corresponding SQL component, NL question chunk, and sub-question, (g) build alignments among SQL, question, and steps, (i) identify and remove redundant text in the question, (j) update the question by emphasizing a certain step, (h) identify a misaligned step, and (k) collect annotated data.
  • Figure 5: The user interface for post-synthesis analysis & automated annotation. Users can (a) generate an analysis report and scoring for annotating the current data pair, (b) accept or reject the current data pair, and (c) start automated data annotation without human intervention.
  • ...and 12 more figures