Table of Contents
Fetching ...

Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL

Zhewei Yao, Guoheng Sun, Lukasz Borchmann, Gaurav Nuti, Zheyu Shen, Minghang Deng, Bohan Zhai, Hao Zhang, Ang Li, Yuxiong He

TL;DR

This work tackles the Text2SQL problem by introducing Arctic-Text2SQL-R1, a reinforcement learning framework that uses a lightweight execution-based reward to train a model family for generating executable SQL from natural language. The approach integrates GRPO, online RL, strong supervised initialization, and careful data curation (including Gretel-Synthsynthetic data with model-based filtering) to achieve state-of-the-art execution accuracy across six benchmarks, with a 32B model reaching 71.83% on BIRD-test and a 7B model matching prior 70B-class systems. Across benchmarks, Arctic-Text2SQL-R1 demonstrates robust performance and parameter efficiency, outperforming large proprietary and open-source baselines, and benefiting from simple inference-time extensions like value retrieval and majority voting. The paper also provides extensive ablations and practical lessons on data, training strategies, and evaluation diversity to guide future RL-based Text2SQL research and deployment.

Abstract

Translating natural language into SQL (Test2SQL) is a longstanding challenge at the intersection of natural language understanding and structured data access. While large language models (LLMs) have significantly improved fluency in SQL generation, producing correct and executable SQL--particularly for complex queries--remains a bottleneck. We present Arctic-Text2SQL-R1, a reinforcement learning (RL) framework and model family designed to generate accurate, executable SQL using a lightweight reward signal based solely on execution correctness. Our approach avoids brittle intermediate supervision and complex reward shaping, promoting stable training and alignment with the end task. Combined with carefully curated data, strong supervised initialization, and effective training practices, Arctic-Text2SQL-R1 achieves state-of-the-art execution accuracy across six diverse Test2SQL benchmarks, including the top position on the BIRD leaderboard. Notably, our 7B model outperforms prior 70B-class systems, highlighting the framework's scalability and efficiency. We further demonstrate inference-time robustness through simple extensions like value retrieval and majority voting. Extensive experiments and ablation studies offer both positive and negative insights, providing practical guidance for future Test2SQL research.

Arctic-Text2SQL-R1: Simple Rewards, Strong Reasoning in Text-to-SQL

TL;DR

This work tackles the Text2SQL problem by introducing Arctic-Text2SQL-R1, a reinforcement learning framework that uses a lightweight execution-based reward to train a model family for generating executable SQL from natural language. The approach integrates GRPO, online RL, strong supervised initialization, and careful data curation (including Gretel-Synthsynthetic data with model-based filtering) to achieve state-of-the-art execution accuracy across six benchmarks, with a 32B model reaching 71.83% on BIRD-test and a 7B model matching prior 70B-class systems. Across benchmarks, Arctic-Text2SQL-R1 demonstrates robust performance and parameter efficiency, outperforming large proprietary and open-source baselines, and benefiting from simple inference-time extensions like value retrieval and majority voting. The paper also provides extensive ablations and practical lessons on data, training strategies, and evaluation diversity to guide future RL-based Text2SQL research and deployment.

Abstract

Translating natural language into SQL (Test2SQL) is a longstanding challenge at the intersection of natural language understanding and structured data access. While large language models (LLMs) have significantly improved fluency in SQL generation, producing correct and executable SQL--particularly for complex queries--remains a bottleneck. We present Arctic-Text2SQL-R1, a reinforcement learning (RL) framework and model family designed to generate accurate, executable SQL using a lightweight reward signal based solely on execution correctness. Our approach avoids brittle intermediate supervision and complex reward shaping, promoting stable training and alignment with the end task. Combined with carefully curated data, strong supervised initialization, and effective training practices, Arctic-Text2SQL-R1 achieves state-of-the-art execution accuracy across six diverse Test2SQL benchmarks, including the top position on the BIRD leaderboard. Notably, our 7B model outperforms prior 70B-class systems, highlighting the framework's scalability and efficiency. We further demonstrate inference-time robustness through simple extensions like value retrieval and majority voting. Extensive experiments and ablation studies offer both positive and negative insights, providing practical guidance for future Test2SQL research.

Paper Structure

This paper contains 19 sections, 2 equations, 9 figures, 11 tables, 2 algorithms.

Figures (9)

  • Figure 1: Generation length and the average accuracy across six benchmarks.
  • Figure A.1: Prompt for Generating Executable SQL Context and Synthetic Data Inserts in the Gretel-Synth Pipeline
  • Figure B.1: Prompt for BIRD Data Augmentation
  • Figure B.2: Prompt for Self-Correction Workflow
  • Figure C.1: Prompt Template for Training and Evaluation
  • ...and 4 more figures