Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search
Boyan Li, Jiayi Zhang, Ju Fan, Yanwei Xu, Chong Chen, Nan Tang, Yuyu Luo
TL;DR
Alpha-SQL reframes zero-shot Text-to-SQL as progressive SQL construction guided by Monte Carlo Tree Search. It introduces LLM-as-Action-Model to generate reasoning actions and a self-supervised, execution-consistency reward to steer search without fine-tuning. The approach integrates a seven-action space, a rollout-based MCTS process, and offline/online value retrieval to harness small open-source LLMs effectively. Empirically, Alpha-SQL delivers competitive results on BIRD and Spider datasets, with significant gains over prior zero-shot methods and robust performance even with smaller models. The work demonstrates how test-time search and structured reasoning can close the gap to fine-tuned baselines in Text-to-SQL tasks.
Abstract
Text-to-SQL, which enables natural language interaction with databases, serves as a pivotal method across diverse industries. With new, more powerful large language models (LLMs) emerging every few months, fine-tuning has become incredibly costly, labor-intensive, and error-prone. As an alternative, zero-shot Text-to-SQL, which leverages the growing knowledge and reasoning capabilities encoded in LLMs without task-specific fine-tuning, presents a promising and more challenging direction. To address this challenge, we propose Alpha-SQL, a novel approach that leverages a Monte Carlo Tree Search (MCTS) framework to iteratively infer SQL construction actions based on partial reasoning states. To enhance the framework's reasoning capabilities, we introduce LLM-as-Action-Model to dynamically generate SQL construction actions during the MCTS process, steering the search toward more promising SQL queries. Moreover, Alpha-SQL employs a self-supervised reward function to evaluate the quality of candidate SQL queries, ensuring more accurate and efficient query generation. Experimental results show that Alpha-SQL achieves 69.7% execution accuracy on the BIRD development set, using a 32B open-source LLM without fine-tuning. Alpha-SQL outperforms the best previous zero-shot approach based on GPT-4o by 2.5% on the BIRD development set.
