Table of Contents
Fetching ...

Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models

Xiaojun Chen, Tianle Wang, Tianhao Qiu, Jianbin Qin, Min Yang

TL;DR

Open-SQL presents a systematic approach for Text-to-SQL with open-source LLMs by combining Open Prompt-based question representation, LoRA-based supervised fine-tuning, and in-context learning with Open Example Curation and CoT templates. It demonstrates large performance gains on the BIRD dataset, with Code Llama-7B achieving 48.24% execution accuracy on BIRD-Dev, surpassing GPT-4 at 46.35% in this setting, and substantial improvements on Spider as well. The work highlights token-efficient techniques to handle large schemas and shows a path for closing the gap between open-source and proprietary LLMs in Text-to-SQL. Limitations include preserving in-context learning after fine-tuning and challenges in schema linking, guiding future research directions.

Abstract

Despite the success of large language models (LLMs) in Text-to-SQL tasks, open-source LLMs encounter challenges in contextual understanding and response coherence. To tackle these issues, we present \ours, a systematic methodology tailored for Text-to-SQL with open-source LLMs. Our contributions include a comprehensive evaluation of open-source LLMs in Text-to-SQL tasks, the \openprompt strategy for effective question representation, and novel strategies for supervised fine-tuning. We explore the benefits of Chain-of-Thought in step-by-step inference and propose the \openexample method for enhanced few-shot learning. Additionally, we introduce token-efficient techniques, such as \textbf{Variable-length Open DB Schema}, \textbf{Target Column Truncation}, and \textbf{Example Column Truncation}, addressing challenges in large-scale databases. Our findings emphasize the need for further investigation into the impact of supervised fine-tuning on contextual learning capabilities. Remarkably, our method significantly improved Llama2-7B from 2.54\% to 41.04\% and Code Llama-7B from 14.54\% to 48.24\% on the BIRD-Dev dataset. Notably, the performance of Code Llama-7B surpassed GPT-4 (46.35\%) on the BIRD-Dev dataset.

Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models

TL;DR

Open-SQL presents a systematic approach for Text-to-SQL with open-source LLMs by combining Open Prompt-based question representation, LoRA-based supervised fine-tuning, and in-context learning with Open Example Curation and CoT templates. It demonstrates large performance gains on the BIRD dataset, with Code Llama-7B achieving 48.24% execution accuracy on BIRD-Dev, surpassing GPT-4 at 46.35% in this setting, and substantial improvements on Spider as well. The work highlights token-efficient techniques to handle large schemas and shows a path for closing the gap between open-source and proprietary LLMs in Text-to-SQL. Limitations include preserving in-context learning after fine-tuning and challenges in schema linking, guiding future research directions.

Abstract

Despite the success of large language models (LLMs) in Text-to-SQL tasks, open-source LLMs encounter challenges in contextual understanding and response coherence. To tackle these issues, we present \ours, a systematic methodology tailored for Text-to-SQL with open-source LLMs. Our contributions include a comprehensive evaluation of open-source LLMs in Text-to-SQL tasks, the \openprompt strategy for effective question representation, and novel strategies for supervised fine-tuning. We explore the benefits of Chain-of-Thought in step-by-step inference and propose the \openexample method for enhanced few-shot learning. Additionally, we introduce token-efficient techniques, such as \textbf{Variable-length Open DB Schema}, \textbf{Target Column Truncation}, and \textbf{Example Column Truncation}, addressing challenges in large-scale databases. Our findings emphasize the need for further investigation into the impact of supervised fine-tuning on contextual learning capabilities. Remarkably, our method significantly improved Llama2-7B from 2.54\% to 41.04\% and Code Llama-7B from 14.54\% to 48.24\% on the BIRD-Dev dataset. Notably, the performance of Code Llama-7B surpassed GPT-4 (46.35\%) on the BIRD-Dev dataset.
Paper Structure (18 sections, 1 equation, 5 figures, 13 tables)

This paper contains 18 sections, 1 equation, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Statistics detailing the failures of Llama2 and Code Llama under a zero-shot setting in Text-to-SQL tasks.
  • Figure 2: The comparative results of GPT-3.5-TURBO, Llama2, and Code Llama across different table schemas on BIRD-dev.
  • Figure 3: Zero-shot outcomes of original open-source LLMs on the BIRD-dev dataset. "W/O COT" indicates inference without Chain-of-Thought (COT), while "COT-SP-PRED" and "COT-SK-PRED" represent inference with the COT-SP and COT-SK methods, respectively, using only the definition of predicted tables and columns. Additionally, "COT-SP-FULL" and "COT-SK-FULL" represent inference with the COT-SP and COT-SK methods, respectively, incorporating the complete database definition.
  • Figure 4: The comparative results of supervised-finetuned Llama2 and Code Llama across various table schemas on BIRD-dev.
  • Figure 5: Zero-shot outcomes of open-source Large Language Models (LLMs) following supervised fine-tuning, both with and without Chain-of-Thought, on the BIRD-dev dataset. "SFTI-W/O COT" indicates supervised fine-tuning and inference without Chain-of-Thought (COT), while "SFTI-COT-SP-PRED" and "SFTI-COT-SK-PRED" represent supervised fine-tuning and inference with the COT-SP and COT-SK methods, respectively, using only the definition of predicted tables and columns. Additionally, "SFTI-COT-SP-FULL" and "SFTI-COT-SK-FULL" represent supervised fine-tuning and inference with the COT-SP and COT-SK methods, respectively, incorporating the complete database definition.