Table of Contents
Fetching ...

MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL

Zekun Xu, Siyu Xia, Chuhuai Yue, Jiajun Chai, Mingxue Tian, Xiaohan Wang, Wei Lin, Haoxuan Li, Guojun Yin

TL;DR

The study addresses the challenge of translating natural language to SQL by leveraging execution feedback in a multi-turn tool-integrated reasoning framework. MTIR-SQL extends GRPO with trajectory filtering and removes KL regularization to stabilize learning while enabling iterative refinement guided by SQL execution results. Empirically, a 4B-parameter MTIR-SQL model achieves 64.4% accuracy on BIRD Dev and 84.6% execution accuracy on SPIDER Dev, outperforming several baselines across parameter scales. The approach demonstrates that integrating dynamic execution feedback into multi-turn reasoning can substantially improve Text-to-SQL performance and robustness, with strong implications for real-world database querying and tool-driven reasoning.

Abstract

As large language models (LLMs) are increasingly used in Text-to-SQL tasks, Reinforcement Learning (RL) has become a common method for improving performance. Existing methods primarily rely on static execution feedback, which restricts real-time error correction. However, integrating multi-turn tool invocation along with dynamic feedback could significantly improve adaptability and robustness, ultimately enhancing model performance. To address these issues, we propose MTIR-SQL, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework for Text-to-SQL. Our approach introduces an execution-aware multi-turn reasoning paradigm that seamlessly incorporates database execution feedback at each reasoning step, enabling context-sensitive query generation and progressive refinement throughout the reasoning process. The framework extends the GRPO algorithm to accommodate complex multi-turn interaction scenarios. Considering the training instability characteristics of MTIR and the potential for significant Deviation of model distribution from the initial model, we enhance the GRPO algorithm by adding a trajectory filtering mechanism and removing KL loss constraints. Experimental results demonstrate that MTIR-SQL, with 4B parameters, achieves \textbf{64.4}\% accuracy in the BIRD Dev and 84.6% execution accuracy in the SPIDER Dev, significantly outperforming existing approaches.

MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL

TL;DR

The study addresses the challenge of translating natural language to SQL by leveraging execution feedback in a multi-turn tool-integrated reasoning framework. MTIR-SQL extends GRPO with trajectory filtering and removes KL regularization to stabilize learning while enabling iterative refinement guided by SQL execution results. Empirically, a 4B-parameter MTIR-SQL model achieves 64.4% accuracy on BIRD Dev and 84.6% execution accuracy on SPIDER Dev, outperforming several baselines across parameter scales. The approach demonstrates that integrating dynamic execution feedback into multi-turn reasoning can substantially improve Text-to-SQL performance and robustness, with strong implications for real-world database querying and tool-driven reasoning.

Abstract

As large language models (LLMs) are increasingly used in Text-to-SQL tasks, Reinforcement Learning (RL) has become a common method for improving performance. Existing methods primarily rely on static execution feedback, which restricts real-time error correction. However, integrating multi-turn tool invocation along with dynamic feedback could significantly improve adaptability and robustness, ultimately enhancing model performance. To address these issues, we propose MTIR-SQL, an innovative Multi-turn Tool-Integrated Reasoning reinforcement learning framework for Text-to-SQL. Our approach introduces an execution-aware multi-turn reasoning paradigm that seamlessly incorporates database execution feedback at each reasoning step, enabling context-sensitive query generation and progressive refinement throughout the reasoning process. The framework extends the GRPO algorithm to accommodate complex multi-turn interaction scenarios. Considering the training instability characteristics of MTIR and the potential for significant Deviation of model distribution from the initial model, we enhance the GRPO algorithm by adding a trajectory filtering mechanism and removing KL loss constraints. Experimental results demonstrate that MTIR-SQL, with 4B parameters, achieves \textbf{64.4}\% accuracy in the BIRD Dev and 84.6% execution accuracy in the SPIDER Dev, significantly outperforming existing approaches.

Paper Structure

This paper contains 22 sections, 7 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of the MTIR-SQL framework. The framework integrates multi-turn reasoning with execution feedback and extends GRPO with trajectory filtering to enable dynamic correction and stable training, thereby enhancing SQL generation accuracy in complex scenarios.
  • Figure 2: Compared to vanilla GRPO, our framework removes the KL constraint, introduces quality-aware rollout filtering, and extends to multi-turn reasoning with SQL execution feedback for more stable and accurate policy optimization.
  • Figure 3: Comparing the impact of different RL Methods on training and performance.
  • Figure 4: Comparing the impact of different max turns on training and performance.
  • Figure 5: Ablation of Reward Components for MTIR-SQL on BIRD Dev.