From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

Ali Mohammadjafari; Anthony S. Maida; Raju Gottumukkala

From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

Ali Mohammadjafari, Anthony S. Maida, Raju Gottumukkala

TL;DR

The paper surveys LLM-based Text-to-SQL systems, focusing on how Retrieval Augmented Generation and Graph RAG address NL-to-SQL challenges such as schema understanding, ambiguity, and cross-domain generalization. It traces evolution from rule-based to LLM-based architectures, studies benchmarks and metrics, and offers a taxonomy of methods including in-context learning, fine-tuning, and RAG. The authors highlight Graph RAG as a promising direction for grounding queries in structured knowledge graphs, improving accuracy and scalability. It also discusses remaining limitations and open challenges, including computational efficiency, dynamic schemas, contextual disambiguation, ethics and privacy, and the role of human-in-the-loop, providing directions for future research.

Abstract

LLMs when used with Retrieval Augmented Generation (RAG), are greatly improving the SOTA of translating natural language queries to structured and correct SQL. Unlike previous reviews, this survey provides a comprehensive study of the evolution of LLM-based text-to-SQL systems, from early rule-based models to advanced LLM approaches that use (RAG) systems. We discuss benchmarks, evaluation methods, and evaluation metrics. Also, we uniquely study the use of Graph RAGs for better contextual accuracy and schema linking in these systems. Finally, we highlight key challenges such as computational efficiency, model robustness, and data privacy toward improvements of LLM-based text-to-SQL systems.

From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

TL;DR

Abstract

Paper Structure (29 sections, 2 equations, 5 figures, 6 tables)

This paper contains 29 sections, 2 equations, 5 figures, 6 tables.

Introduction
Overview of the Text-to-SQ task
Introducing Retrieval Augmented Generation as a Solution
Contributions of this Survey
Evolution of Text-to-SQL Systems in the Literature
Evolutionary Progression
LLM-based Text-to-SQL Architecture and RAG-Integrated Systems
BENCHMARKS AND EVALUATION METHODS
Types of Datasets used in Benchmarks
Evaluation Metrics Used in Benchmarks
Content Matching-based Metrics
Execution-based Metrics
Methods
In-context Learning
Fine-Tuning
...and 14 more sections

Figures (5)

Figure 1: How text-to-SQL research has evolved over time, highlighting different implementation approaches. Each phase includes key techniques and notable works. The dates are approximate, based on when these key works were released, with a margin of error of about a year. The design is inspired by hong2024next, zhao2023survey.
Figure 2: Illustrates the key stages of the traditional text-to-SQL process using Large Language Models (LLMs).
Figure 3: Illustrates the High-Level Workflow of RAG-based Text-to-SQL System (RAG-TO-SQL).
Figure 4: Taxonomy of research approaches in LLM-based text-to-SQL. The format is adapted from xu2023large.
Figure 5: Four main evaluation metrics falling into two categories.

From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

TL;DR

Abstract

From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (5)