MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL
Haolin Yang, Jipeng Zhang, Zhitao He, Yi R. Fung
TL;DR
MARS-SQL introduces a three-agent, interactive RL framework for Text-to-SQL, separating schema grounding, multi-turn trajectory generation, and trajectory verification to address compositional reasoning, schema understanding, and environmental grounding. The Generation Agent learns via a ReAct-style think-act-observe loop and is complemented by a Grounding Agent that prunes the schema and a Validation Agent that scores trajectories with a next-token verifier, yielding robust execution accuracy. The approach achieves new state-of-the-art results on Bird-dev (77.84%) and Spider-test (89.75%), and demonstrates strong cross-domain generalization (Spider-DK) without Spider training data. Ablation studies confirm the necessity and synergy of all three agents and the benefits of longer interaction turns, establishing a promising direction for dependable, data-centric AI systems in complex database tasks.
Abstract
Translating natural language to SQL remains difficult for complex queries. Such queries often need environmental interaction and self-correction. To address this, we introduce MARS-SQL, a novel multi-agent framework that combines principled task decomposition and interactive reinforcement learning (RL). Our system comprises three specialized agents: a Grounding Agent for schema linking, a Generation Agent for query generation, and a Validation Agent for final selection. The core of our framework is the Generation agent, which is trained via a multi-turn RL policy. Adopting a ReAct-style Think-Act-Observe loop, the agent iteratively generates thoughts, executes SQL actions against a live database, and revises its strategy based on execution feedback, enabling dynamic, stateful reasoning and self-correction. At inference time, we generate multiple interaction trajectories to explore diverse reasoning paths. The Validation agent, then selects the optimal trajectory by modeling verification as a next-token prediction task and choosing the solution with the highest generation probability. This structured workflow pipelines specialized agents. It combines interactive RL for generation with generative modeling for verification. The approach proves highly effective for robust and accurate SQL generation. Experiments show that MARS-SQL achieves state-of-the-art Execution Accuracy of 77.84% on the BIRD dev set and 89.75% on the Spider test set. Our code is available at https://github.com/YangHaolin0526/MARS-SQL.
