Table of Contents
Fetching ...

A Survey on Employing Large Language Models for Text-to-SQL Tasks

Liang Shi, Zhengju Tang, Nan Zhang, Xiaotong Zhang, Zhi Yang

TL;DR

The paper surveys the rise of large language models for text-to-SQL, distinguishing prompt-engineering and finetuning as the two main strategies. It catalogs benchmarks and metrics, analyzes prompt designs, schema linking, and reasoning workflows, and reviews base-model choices (open vs closed) and training data. Key contributions include a systematic taxonomy of LLM-based Text-to-SQL pipelines, a synthesis of benchmarking studies, and a forward-looking discussion on privacy, domain knowledge, and autonomous agents. The work highlights practical pathways and challenges for deploying LLM-driven Text-to-SQL in real-world, enterprise-scale databases.

Abstract

With the development of the Large Language Models (LLMs), a large range of LLM-based Text-to-SQL(Text2SQL) methods have emerged. This survey provides a comprehensive review of LLM-based Text2SQL studies. We first enumerate classic benchmarks and evaluation metrics. For the two mainstream methods, prompt engineering and finetuning, we introduce a comprehensive taxonomy and offer practical insights into each subcategory. We present an overall analysis of the above methods and various models evaluated on well-known datasets and extract some characteristics. Finally, we discuss the challenges and future directions in this field.

A Survey on Employing Large Language Models for Text-to-SQL Tasks

TL;DR

The paper surveys the rise of large language models for text-to-SQL, distinguishing prompt-engineering and finetuning as the two main strategies. It catalogs benchmarks and metrics, analyzes prompt designs, schema linking, and reasoning workflows, and reviews base-model choices (open vs closed) and training data. Key contributions include a systematic taxonomy of LLM-based Text-to-SQL pipelines, a synthesis of benchmarking studies, and a forward-looking discussion on privacy, domain knowledge, and autonomous agents. The work highlights practical pathways and challenges for deploying LLM-driven Text-to-SQL in real-world, enterprise-scale databases.

Abstract

With the development of the Large Language Models (LLMs), a large range of LLM-based Text-to-SQL(Text2SQL) methods have emerged. This survey provides a comprehensive review of LLM-based Text2SQL studies. We first enumerate classic benchmarks and evaluation metrics. For the two mainstream methods, prompt engineering and finetuning, we introduce a comprehensive taxonomy and offer practical insights into each subcategory. We present an overall analysis of the above methods and various models evaluated on well-known datasets and extract some characteristics. Finally, we discuss the challenges and future directions in this field.
Paper Structure (43 sections, 4 equations, 8 figures, 1 table)

This paper contains 43 sections, 4 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Framework of employing LLMs in Text-to-SQL
  • Figure 2: The evolution of Text-to-SQL approach over time
  • Figure 3: Taxomony of Prompt Engineering
  • Figure 4: The framework of "Question Representation". It usually contains three parts sequentially. The Layout part includes the question itself and the database structures. The Data part includes sampled data from real database content. The Knowledge part includes related evidence of some SQL-related priori and other knowledge from the external world. The detail in the figure is just for functional illustration.
  • Figure 5: An overview of schema linking methods used in LLM-based text-to-SQL papers. (a), (b), (c), and (d) are LLM-based schema linking methods, while (e) and (f) are traditional schema linking methods. (a) and (b) correspond to prompting LLMs in specific steps designed for schema linking. (c) corresponds to using SQL to guide schema linking. (d) corresponds to enhancing LLM-based schema linking performance by utilizing general LLM techniques. (e) and (f) are similarity methods and connectivity methods, respectively.
  • ...and 3 more figures