Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL

Dayton G. Thorpe; Andrew J. Duberstein; Ian A. Kinsey

Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL

Dayton G. Thorpe, Andrew J. Duberstein, Ian A. Kinsey

TL;DR

This paper tackles the enduring gap between automated text-to-SQL performance and human capability by introducing two Dubo-SQL approaches. Dubo-SQL v1 employs a low-cost fine-tuning pipeline on GPT-3.5 Turbo, while Dubo-SQL v2 adopts a diverse retrieval-augmented generation (RAG) strategy with GPT-4 Turbo, leveraging a SQL compiler loop and JSON outputs to boost execution accuracy on the BIRD-SQL benchmark. On holdout data, v1 achieves $EX$ of $60.71\%$, surpassing several strong baselines, and v2 attains $EX$ of $61.47\%$ on the dev set, outpacing many prior methods though still trailing some ensembles. The work also provides a thorough cost analysis, showing favorable training and inference costs for v1 and a higher but still cost-conscious profile for v2, and includes ablation studies that quantify the contributions of error correction, JSON formatting, and diverse retrieval. Overall, Dubo-SQL demonstrates that combining low-cost fine-tuning with diverse RAG and careful input/output design can significantly advance text-to-SQL performance while maintaining practical costs.

Abstract

The current state-of-the-art (SOTA) for automated text-to-SQL still falls well short of expert human performance as measured by execution accuracy (EX) on the BIRD-SQL benchmark. The most accurate methods are also slow and expensive. To advance the SOTA for text-to-SQL while reducing cost and improving speed, we explore the combination of low-cost fine tuning, novel methods for diverse retrieval-augmented generation (RAG) and new input and output formats that help large language models (LLMs) achieve higher EX. We introduce two new methods, Dubo-SQL v1 and v2. Dubo-SQL v1 sets a new record for EX on the holdout test set of BIRD-SQL. Dubo-SQL v2 achieves even higher performance on the BIRD-SQL dev set. Dubo-SQL v1 relies on LLMs from OpenAI, but uses the low-cost GPT-3.5 Turbo while exceeding the performance of the next-best model using OpenAI, which instead uses the more expensive GPT-4. Dubo-SQL v1 exceeds the performance of the next-best model using GPT-3.5 by over 20%. Dubo-SQL v2 uses GPT-4 Turbo and RAG in place of fine tuning to push EX higher.

Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL

TL;DR

, surpassing several strong baselines, and v2 attains

on the dev set, outpacing many prior methods though still trailing some ensembles. The work also provides a thorough cost analysis, showing favorable training and inference costs for v1 and a higher but still cost-conscious profile for v2, and includes ablation studies that quantify the contributions of error correction, JSON formatting, and diverse retrieval. Overall, Dubo-SQL demonstrates that combining low-cost fine-tuning with diverse RAG and careful input/output design can significantly advance text-to-SQL performance while maintaining practical costs.

Abstract

Paper Structure (15 sections, 3 figures, 3 tables)

This paper contains 15 sections, 3 figures, 3 tables.

Introduction
Related Work
Methodology
Dubo-SQL v1
Dubo-SQL v2
Experiment
Models
Dataset
Metrics
Results
Execution Accuracy
Cost
Ablation Study
Conclusions
Limitations

Figures (3)

Figure 1: A diagram of Dubo-SQL v1
Figure 2: A diagram of Dubo-SQL v2
Figure 3: a) Performance with varying numbers of few-shot examples. b) Performance with non-zero temperature

Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL

TL;DR

Abstract

Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL

Authors

TL;DR

Abstract

Table of Contents

Figures (3)