Table of Contents
Fetching ...

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

Peixian Ma, Xialie Zhuang, Chengjin Xu, Xuhui Jiang, Ran Chen, Jian Guo

TL;DR

The paper tackles NL2SQL reasoning in complex schemas by introducing SQL-R1, a reasoning-focused NL2SQL model trained with reinforcement learning. It designs a four-part reward system and employs GRPO to optimize SQL generation, leveraging synthetic data (SynSQL-2.5M) and a small RL dataset to overcome data scarcity. The approach yields competitive accuracy on Spider and BIRD benchmarks, with evidence that RL can improve reasoning quality and interpretability, including explicit reasoning traces. Ablation and cold-start analyses highlight the importance of reward design and data provenance, suggesting synthetic data engineering as a key lever for generalization. Overall, SQL-R1 demonstrates the viability of RL-based NL2SQL to enhance reasoning performance while supporting domain adaptation and transparency for high-risk applications.

Abstract

Natural Language to SQL (NL2SQL) enables intuitive interactions with databases by transforming natural language queries into structured SQL statements. Despite recent advancements in enhancing human-computer interaction within database applications, significant challenges persist, particularly regarding the reasoning performance in complex scenarios involving multi-table joins and nested queries. Current methodologies primarily utilize supervised fine-tuning~(SFT) to train the NL2SQL model, which may limit adaptability and interpretability in new environments~(e.g., finance and healthcare). In order to enhance the reasoning performance of the NL2SQL model in the above complex situations, we introduce SQL-R1, a novel NL2SQL reasoning model trained by the reinforcement learning~(RL) algorithms. We design a specialized RL-based reward function tailored for NL2SQL tasks and discussed the impact of cold start and synthetic data on the effectiveness of intensive training. In addition, we achieve competitive accuracy using only a tiny amount of synthetic NL2SQL data for augmented training and further explore data engineering for RL. In existing experiments, SQL-R1 achieves execution accuracy of 88.6\% and 67.1\% on the benchmark Spider and BIRD, respectively. The code is available at https://github.com/IDEA-FinAI/SQL-R1 .

SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning

TL;DR

The paper tackles NL2SQL reasoning in complex schemas by introducing SQL-R1, a reasoning-focused NL2SQL model trained with reinforcement learning. It designs a four-part reward system and employs GRPO to optimize SQL generation, leveraging synthetic data (SynSQL-2.5M) and a small RL dataset to overcome data scarcity. The approach yields competitive accuracy on Spider and BIRD benchmarks, with evidence that RL can improve reasoning quality and interpretability, including explicit reasoning traces. Ablation and cold-start analyses highlight the importance of reward design and data provenance, suggesting synthetic data engineering as a key lever for generalization. Overall, SQL-R1 demonstrates the viability of RL-based NL2SQL to enhance reasoning performance while supporting domain adaptation and transparency for high-risk applications.

Abstract

Natural Language to SQL (NL2SQL) enables intuitive interactions with databases by transforming natural language queries into structured SQL statements. Despite recent advancements in enhancing human-computer interaction within database applications, significant challenges persist, particularly regarding the reasoning performance in complex scenarios involving multi-table joins and nested queries. Current methodologies primarily utilize supervised fine-tuning~(SFT) to train the NL2SQL model, which may limit adaptability and interpretability in new environments~(e.g., finance and healthcare). In order to enhance the reasoning performance of the NL2SQL model in the above complex situations, we introduce SQL-R1, a novel NL2SQL reasoning model trained by the reinforcement learning~(RL) algorithms. We design a specialized RL-based reward function tailored for NL2SQL tasks and discussed the impact of cold start and synthetic data on the effectiveness of intensive training. In addition, we achieve competitive accuracy using only a tiny amount of synthetic NL2SQL data for augmented training and further explore data engineering for RL. In existing experiments, SQL-R1 achieves execution accuracy of 88.6\% and 67.1\% on the benchmark Spider and BIRD, respectively. The code is available at https://github.com/IDEA-FinAI/SQL-R1 .

Paper Structure

This paper contains 47 sections, 5 equations, 12 figures, 10 tables.

Figures (12)

  • Figure 1: Demonstration of our work. Previous work on NL2SQL primarily relies on supervised fine-tuning to enable the model to learn how to generate SQL. However, in the case of complex database schema or ambiguous semantics, the fine-tuned model may struggle to produce SQL that does not align with the user's intentions, as it depends on a fixed generation strategy and previous data. By introducing reinforcement learning algorithms, the model can receive intuitive feedback from the database during the training process. This feedback encourages the model to independently explore various SQL generation reasoning approaches, ultimately enhancing the accuracy of its output.
  • Figure 2: Performance and model scale on the BIRD-Dev dataset.
  • Figure 3: Example for NL2SQL Reasoning - No RL Training - Challenge Sample
  • Figure 4: Example for NL2SQL Reasoning - RL Training - Challenge Sample
  • Figure 5: Example for NL2SQL Reasoning - No RL Training - Moderate Sample
  • ...and 7 more figures