A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?

Xinyu Liu; Shuyu Shen; Boyan Li; Peixian Ma; Runzhi Jiang; Yuxin Zhang; Ju Fan; Guoliang Li; Nan Tang; Yuyu Luo

A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?

Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, Yuyu Luo

TL;DR

This survey maps the NL2SQL landscape in the era of large language models, detailing a lifecycle that spans pre-processing, translation, and post-processing. It highlights modular, prompt-driven approaches, intermediate representations, and multi-agent collaborations as central trends, and surveys data, benchmarks, evaluation tools, and error taxonomies. The authors provide a practical data-driven roadmap and decision flows to guide deployment, while outlining open problems such as cross-database open Text-to-SQL, efficiency, explainability, and domain adaptation. An online NL2SQL Handbook is offered as a living resource to track evolving challenges and solutions.

Abstract

Translating users' natural language queries (NL) into SQL queries (i.e., Text-to-SQL, a.k.a. NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of Text-to-SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of Text-to-SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: Text-to-SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to Text-to-SQL benchmarks; (3) Evaluation: Evaluating Text-to-SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing Text-to-SQL errors to find the root cause and guiding Text-to-SQL models to evolve. Moreover, we offer a rule of thumb for developing Text-to-SQL solutions. Finally, we discuss the research challenges and open problems of Text-to-SQL in the LLMs era. Text-to-SQL Handbook: https://github.com/HKUSTDial/NL2SQL Handbook

A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?

TL;DR

Abstract

Paper Structure (54 sections, 6 equations, 11 figures, 2 tables)

This paper contains 54 sections, 6 equations, 11 figures, 2 tables.

Text-to-SQL Problem and Background
Problem Formulation
Text-to-SQL Human Workflow
Text-to-SQL Task Challenges
Challenges Solving with Large Language Models
Rule-based Stage
Neural Network-based Stage
PLM-based Stage
LLM-based Stage
Language Model-powered Text-to-SQL Overview
Pre-Processing Strategies for Text-to-SQL
Schema Linking
String Matching-based Schema Linking
Neural Network-based Schema Linking
In-Context Learning for Schema Linking
...and 39 more sections

Figures (11)

Figure 1: Examples of the Text-to-SQL Task and Its Challenges.
Figure 2: The Evolution of Text-to-SQL Solutions from the Perspective of Language Models.
Figure 3: The Categorization of PLM and LLM in Text-to-SQL.
Figure 4: An Overview of Text-to-SQL Modules in LLM Era.
Figure 5: Comparisons of Existing Text-to-SQL Solutions.
...and 6 more figures

Theorems & Definitions (1)

Definition 1: Natural Language to SQL (Text-to-SQL)

A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?

TL;DR

Abstract

A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (1)