A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?
Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang, Yuyu Luo
TL;DR
This survey maps the NL2SQL landscape in the era of large language models, detailing a lifecycle that spans pre-processing, translation, and post-processing. It highlights modular, prompt-driven approaches, intermediate representations, and multi-agent collaborations as central trends, and surveys data, benchmarks, evaluation tools, and error taxonomies. The authors provide a practical data-driven roadmap and decision flows to guide deployment, while outlining open problems such as cross-database open Text-to-SQL, efficiency, explainability, and domain adaptation. An online NL2SQL Handbook is offered as a living resource to track evolving challenges and solutions.
Abstract
Translating users' natural language queries (NL) into SQL queries (i.e., Text-to-SQL, a.k.a. NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of Text-to-SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of Text-to-SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: Text-to-SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to Text-to-SQL benchmarks; (3) Evaluation: Evaluating Text-to-SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing Text-to-SQL errors to find the root cause and guiding Text-to-SQL models to evolve. Moreover, we offer a rule of thumb for developing Text-to-SQL solutions. Finally, we discuss the research challenges and open problems of Text-to-SQL in the LLMs era. Text-to-SQL Handbook: https://github.com/HKUSTDial/NL2SQL Handbook
