SQLong: Enhanced NL2SQL for Longer Contexts with LLMs
Dai Quoc Nguyen, Cong Duy Vu Hoang, Duy Vu, Gioacchino Tangari, Thanh Tien Vu, Don Dharmasiri, Yuan-Fang Li, Long Duong
TL;DR
The paper addresses NL2SQL models struggling with large database schemas due to limited context length. It introduces SQLong, a data-augmentation pipeline that extends schemas with synthetic CREATE TABLE statements and data rows to create long-context training and benchmark prompts, formalized via a supervised finetuning objective. Specifically, finetuning optimizes $ \mathbb{E}_{(\mathbf{x},\mathbf{s})\sim\mathbf{T}} [ \sum_{i=1}^{|\mathbf{s}|} \log p_{\theta}(s_i | \mathbf{s}_{<i}, \mathbf{x}) ] $, enabling LLMs to better map natural language questions to SQL under long contexts. Empirical results on Spider and BIRD across multiple models show consistent gains: average improvements over 2.2% on original data, 11% over non-SQLong baselines, and up to 6% over larger models on long-context benchmarks, with notable examples such as Llama-3.1-8B-Instruct achieving 77.1% at 8k and 72.3% at 24k on Spider-test. The work demonstrates practical impact for real-world databases and suggests future enhancements by integrating RAG-based schema linking for even more concise long-context inputs.
Abstract
Open-weight large language models (LLMs) have significantly advanced performance in the Natural Language to SQL (NL2SQL) task. However, their effectiveness diminishes when dealing with large database schemas, as the context length increases. To address this limitation, we present SQLong, a novel and efficient data augmentation framework designed to enhance LLM performance in long-context scenarios for the NL2SQL task. SQLong generates augmented datasets by extending existing database schemas with additional synthetic CREATE TABLE commands and corresponding data rows, sampled from diverse schemas in the training data. This approach effectively simulates long-context scenarios during finetuning and evaluation. Through experiments on the Spider and BIRD datasets, we demonstrate that LLMs finetuned with SQLong-augmented data significantly outperform those trained on standard datasets. These imply SQLong's practical implementation and its impact on improving NL2SQL capabilities in real-world settings with complex database schemas.
