Table of Contents
Fetching ...

A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?

Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, Yuyu Luo

TL;DR

This survey maps the NL2SQL landscape in the era of large language models, detailing a lifecycle that spans pre-processing, translation, and post-processing. It highlights modular, prompt-driven approaches, intermediate representations, and multi-agent collaborations as central trends, and surveys data, benchmarks, evaluation tools, and error taxonomies. The authors provide a practical data-driven roadmap and decision flows to guide deployment, while outlining open problems such as cross-database open Text-to-SQL, efficiency, explainability, and domain adaptation. An online NL2SQL Handbook is offered as a living resource to track evolving challenges and solutions.

Abstract

Translating users' natural language queries (NL) into SQL queries (i.e., Text-to-SQL, a.k.a. NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of Text-to-SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of Text-to-SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: Text-to-SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to Text-to-SQL benchmarks; (3) Evaluation: Evaluating Text-to-SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing Text-to-SQL errors to find the root cause and guiding Text-to-SQL models to evolve. Moreover, we offer a rule of thumb for developing Text-to-SQL solutions. Finally, we discuss the research challenges and open problems of Text-to-SQL in the LLMs era. Text-to-SQL Handbook: https://github.com/HKUSTDial/NL2SQL Handbook

A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?

TL;DR

This survey maps the NL2SQL landscape in the era of large language models, detailing a lifecycle that spans pre-processing, translation, and post-processing. It highlights modular, prompt-driven approaches, intermediate representations, and multi-agent collaborations as central trends, and surveys data, benchmarks, evaluation tools, and error taxonomies. The authors provide a practical data-driven roadmap and decision flows to guide deployment, while outlining open problems such as cross-database open Text-to-SQL, efficiency, explainability, and domain adaptation. An online NL2SQL Handbook is offered as a living resource to track evolving challenges and solutions.

Abstract

Translating users' natural language queries (NL) into SQL queries (i.e., Text-to-SQL, a.k.a. NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of Text-to-SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of Text-to-SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: Text-to-SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to Text-to-SQL benchmarks; (3) Evaluation: Evaluating Text-to-SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing Text-to-SQL errors to find the root cause and guiding Text-to-SQL models to evolve. Moreover, we offer a rule of thumb for developing Text-to-SQL solutions. Finally, we discuss the research challenges and open problems of Text-to-SQL in the LLMs era. Text-to-SQL Handbook: https://github.com/HKUSTDial/NL2SQL Handbook
Paper Structure (54 sections, 6 equations, 11 figures, 2 tables)

This paper contains 54 sections, 6 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Examples of the Text-to-SQL Task and Its Challenges.
  • Figure 2: The Evolution of Text-to-SQL Solutions from the Perspective of Language Models.
  • Figure 3: The Categorization of PLM and LLM in Text-to-SQL.
  • Figure 4: An Overview of Text-to-SQL Modules in LLM Era.
  • Figure 5: Comparisons of Existing Text-to-SQL Solutions.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Definition 1: Natural Language to SQL (Text-to-SQL)