Table of Contents
Fetching ...

SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL

Jimin Lee, Ingeol Baek, Byeongjeong Kim, Hyunkyung Bae, Hwanhee Lee

TL;DR

SAFE-SQL tackles Text-to-SQL when relevant training examples are scarce by generating self-augmented, schema-aware demonstrations and filtering them with a fine-grained relevance mechanism. The approach integrates Schema Linking, dynamic Example Generation, multi-faceted Relevance Scoring, and Threshold-based filtering to craft high-quality in-context learning prompts, enabling robust SQL generation without extra fine-tuning. Empirical results on Spider and Bird show state-of-the-art or competitive execution accuracy, with notable gains in hard and unseen cases, and competitive performance using open models. The work demonstrates the practical value of unsupervised synthetic demonstrations and precise example selection for improving LLM-based semantic translation tasks in real-world, data-constrained settings.

Abstract

Text-to-SQL aims to convert natural language questions into executable SQL queries. While previous approaches, such as skeleton-masked selection, have demonstrated strong performance by retrieving similar training examples to guide large language models (LLMs), they struggle in real-world scenarios where such examples are unavailable. To overcome this limitation, we propose Self-Augmentation in-context learning with Fine-grained Example selection for Text-to-SQL (SAFE-SQL), a novel framework that improves SQL generation by generating and filtering self-augmented examples. SAFE-SQL first prompts an LLM to generate multiple Text-to-SQL examples relevant to the test input. Then SAFE-SQL filters these examples through three relevance assessments, constructing high-quality in-context learning examples. Using self-generated examples, SAFE-SQL surpasses the previous zero-shot, and few-shot Text-to-SQL frameworks, achieving higher execution accuracy. Notably, our approach provides additional performance gains in extra hard and unseen scenarios, where conventional methods often fail.

SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL

TL;DR

SAFE-SQL tackles Text-to-SQL when relevant training examples are scarce by generating self-augmented, schema-aware demonstrations and filtering them with a fine-grained relevance mechanism. The approach integrates Schema Linking, dynamic Example Generation, multi-faceted Relevance Scoring, and Threshold-based filtering to craft high-quality in-context learning prompts, enabling robust SQL generation without extra fine-tuning. Empirical results on Spider and Bird show state-of-the-art or competitive execution accuracy, with notable gains in hard and unseen cases, and competitive performance using open models. The work demonstrates the practical value of unsupervised synthetic demonstrations and precise example selection for improving LLM-based semantic translation tasks in real-world, data-constrained settings.

Abstract

Text-to-SQL aims to convert natural language questions into executable SQL queries. While previous approaches, such as skeleton-masked selection, have demonstrated strong performance by retrieving similar training examples to guide large language models (LLMs), they struggle in real-world scenarios where such examples are unavailable. To overcome this limitation, we propose Self-Augmentation in-context learning with Fine-grained Example selection for Text-to-SQL (SAFE-SQL), a novel framework that improves SQL generation by generating and filtering self-augmented examples. SAFE-SQL first prompts an LLM to generate multiple Text-to-SQL examples relevant to the test input. Then SAFE-SQL filters these examples through three relevance assessments, constructing high-quality in-context learning examples. Using self-generated examples, SAFE-SQL surpasses the previous zero-shot, and few-shot Text-to-SQL frameworks, achieving higher execution accuracy. Notably, our approach provides additional performance gains in extra hard and unseen scenarios, where conventional methods often fail.

Paper Structure

This paper contains 34 sections, 2 equations, 5 figures, 10 tables.

Figures (5)

  • Figure 1: The example on the left shows a failure in retrieving relevant examples due to masked keywords, which results in superficially similar but actually unrelated questions being selected. In contrast, our self-augmented approach generates N-examples and filters them using 3 criteria, resulting in appropriate example retrieval.
  • Figure 2: Overall flow of our proposed SAFE-SQL.
  • Figure 3: (Left) Correlation between question embedding similarity and average EX, (Right) Average EX across embedding similarity bins
  • Figure 4: Performance of GPT-4o at different relevance score thresholds.
  • Figure 5: Embedding of spider dev set training questions.