Table of Contents
Fetching ...

From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

Ali Mohammadjafari, Anthony S. Maida, Raju Gottumukkala

TL;DR

The paper surveys LLM-based Text-to-SQL systems, focusing on how Retrieval Augmented Generation and Graph RAG address NL-to-SQL challenges such as schema understanding, ambiguity, and cross-domain generalization. It traces evolution from rule-based to LLM-based architectures, studies benchmarks and metrics, and offers a taxonomy of methods including in-context learning, fine-tuning, and RAG. The authors highlight Graph RAG as a promising direction for grounding queries in structured knowledge graphs, improving accuracy and scalability. It also discusses remaining limitations and open challenges, including computational efficiency, dynamic schemas, contextual disambiguation, ethics and privacy, and the role of human-in-the-loop, providing directions for future research.

Abstract

LLMs when used with Retrieval Augmented Generation (RAG), are greatly improving the SOTA of translating natural language queries to structured and correct SQL. Unlike previous reviews, this survey provides a comprehensive study of the evolution of LLM-based text-to-SQL systems, from early rule-based models to advanced LLM approaches that use (RAG) systems. We discuss benchmarks, evaluation methods, and evaluation metrics. Also, we uniquely study the use of Graph RAGs for better contextual accuracy and schema linking in these systems. Finally, we highlight key challenges such as computational efficiency, model robustness, and data privacy toward improvements of LLM-based text-to-SQL systems.

From Natural Language to SQL: Review of LLM-based Text-to-SQL Systems

TL;DR

The paper surveys LLM-based Text-to-SQL systems, focusing on how Retrieval Augmented Generation and Graph RAG address NL-to-SQL challenges such as schema understanding, ambiguity, and cross-domain generalization. It traces evolution from rule-based to LLM-based architectures, studies benchmarks and metrics, and offers a taxonomy of methods including in-context learning, fine-tuning, and RAG. The authors highlight Graph RAG as a promising direction for grounding queries in structured knowledge graphs, improving accuracy and scalability. It also discusses remaining limitations and open challenges, including computational efficiency, dynamic schemas, contextual disambiguation, ethics and privacy, and the role of human-in-the-loop, providing directions for future research.

Abstract

LLMs when used with Retrieval Augmented Generation (RAG), are greatly improving the SOTA of translating natural language queries to structured and correct SQL. Unlike previous reviews, this survey provides a comprehensive study of the evolution of LLM-based text-to-SQL systems, from early rule-based models to advanced LLM approaches that use (RAG) systems. We discuss benchmarks, evaluation methods, and evaluation metrics. Also, we uniquely study the use of Graph RAGs for better contextual accuracy and schema linking in these systems. Finally, we highlight key challenges such as computational efficiency, model robustness, and data privacy toward improvements of LLM-based text-to-SQL systems.
Paper Structure (29 sections, 2 equations, 5 figures, 6 tables)

This paper contains 29 sections, 2 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: How text-to-SQL research has evolved over time, highlighting different implementation approaches. Each phase includes key techniques and notable works. The dates are approximate, based on when these key works were released, with a margin of error of about a year. The design is inspired by hong2024next, zhao2023survey.
  • Figure 2: Illustrates the key stages of the traditional text-to-SQL process using Large Language Models (LLMs).
  • Figure 3: Illustrates the High-Level Workflow of RAG-based Text-to-SQL System (RAG-TO-SQL).
  • Figure 4: Taxonomy of research approaches in LLM-based text-to-SQL. The format is adapted from xu2023large.
  • Figure 5: Four main evaluation metrics falling into two categories.