SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Peixian Ma, Xialie Zhuang, Chengjin Xu, Xuhui Jiang, Ran Chen, Jian Guo
TL;DR
The paper tackles NL2SQL reasoning in complex schemas by introducing SQL-R1, a reasoning-focused NL2SQL model trained with reinforcement learning. It designs a four-part reward system and employs GRPO to optimize SQL generation, leveraging synthetic data (SynSQL-2.5M) and a small RL dataset to overcome data scarcity. The approach yields competitive accuracy on Spider and BIRD benchmarks, with evidence that RL can improve reasoning quality and interpretability, including explicit reasoning traces. Ablation and cold-start analyses highlight the importance of reward design and data provenance, suggesting synthetic data engineering as a key lever for generalization. Overall, SQL-R1 demonstrates the viability of RL-based NL2SQL to enhance reasoning performance while supporting domain adaptation and transparency for high-risk applications.
Abstract
Natural Language to SQL (NL2SQL) enables intuitive interactions with databases by transforming natural language queries into structured SQL statements. Despite recent advancements in enhancing human-computer interaction within database applications, significant challenges persist, particularly regarding the reasoning performance in complex scenarios involving multi-table joins and nested queries. Current methodologies primarily utilize supervised fine-tuning~(SFT) to train the NL2SQL model, which may limit adaptability and interpretability in new environments~(e.g., finance and healthcare). In order to enhance the reasoning performance of the NL2SQL model in the above complex situations, we introduce SQL-R1, a novel NL2SQL reasoning model trained by the reinforcement learning~(RL) algorithms. We design a specialized RL-based reward function tailored for NL2SQL tasks and discussed the impact of cold start and synthetic data on the effectiveness of intensive training. In addition, we achieve competitive accuracy using only a tiny amount of synthetic NL2SQL data for augmented training and further explore data engineering for RL. In existing experiments, SQL-R1 achieves execution accuracy of 88.6\% and 67.1\% on the benchmark Spider and BIRD, respectively. The code is available at https://github.com/IDEA-FinAI/SQL-R1 .
