Open-SQL Framework: Enhancing Text-to-SQL on Open-source Large Language Models
Xiaojun Chen, Tianle Wang, Tianhao Qiu, Jianbin Qin, Min Yang
TL;DR
Open-SQL presents a systematic approach for Text-to-SQL with open-source LLMs by combining Open Prompt-based question representation, LoRA-based supervised fine-tuning, and in-context learning with Open Example Curation and CoT templates. It demonstrates large performance gains on the BIRD dataset, with Code Llama-7B achieving 48.24% execution accuracy on BIRD-Dev, surpassing GPT-4 at 46.35% in this setting, and substantial improvements on Spider as well. The work highlights token-efficient techniques to handle large schemas and shows a path for closing the gap between open-source and proprietary LLMs in Text-to-SQL. Limitations include preserving in-context learning after fine-tuning and challenges in schema linking, guiding future research directions.
Abstract
Despite the success of large language models (LLMs) in Text-to-SQL tasks, open-source LLMs encounter challenges in contextual understanding and response coherence. To tackle these issues, we present \ours, a systematic methodology tailored for Text-to-SQL with open-source LLMs. Our contributions include a comprehensive evaluation of open-source LLMs in Text-to-SQL tasks, the \openprompt strategy for effective question representation, and novel strategies for supervised fine-tuning. We explore the benefits of Chain-of-Thought in step-by-step inference and propose the \openexample method for enhanced few-shot learning. Additionally, we introduce token-efficient techniques, such as \textbf{Variable-length Open DB Schema}, \textbf{Target Column Truncation}, and \textbf{Example Column Truncation}, addressing challenges in large-scale databases. Our findings emphasize the need for further investigation into the impact of supervised fine-tuning on contextual learning capabilities. Remarkably, our method significantly improved Llama2-7B from 2.54\% to 41.04\% and Code Llama-7B from 14.54\% to 48.24\% on the BIRD-Dev dataset. Notably, the performance of Code Llama-7B surpassed GPT-4 (46.35\%) on the BIRD-Dev dataset.
