MCTS-SQL: Light-Weight LLMs can Master the Text-to-SQL through Monte Carlo Tree Search
Shuozhi Yuan, Limin Chen, Miaomiao Yuan, Zhao Jin
TL;DR
The paper tackles the challenge of performing Text-to-SQL with lightweight LLMs by introducing MCTS-SQL, a multi-stage framework that progressively reduces the search space (via a Selector), starts from a strong initial SQL (Direct Generator), and iteratively refines queries using Monte Carlo Tree Search (MCTS-Refiner). It further accelerates inference with a token-level Prefix-Cache that reuses invariant prompt components across iterations. Empirical results on SPIDER and BIRD show that small models can outperform ChatGPT-3.5 and reach competitive accuracy against state-of-the-art with Gemini 2.5, while achieving substantial latency reductions. The work demonstrates a practical path to deploy Text-to-SQL on edge devices without relying on large-scale models or costly APIs.
Abstract
Text-to-SQL is a fundamental yet challenging task in the NLP area, aiming at translating natural language questions into SQL queries. While recent advances in large language models have greatly improved performance, most existing approaches depend on models with tens of billions of parameters or costly APIs, limiting their applicability in resource-constrained environments. For real world, especially on edge devices, it is crucial for Text-to-SQL to ensure cost-effectiveness. Therefore, enabling the light-weight models for Text-to-SQL is of great practical significance. However, smaller LLMs often struggle with complicated user instruction, redundant schema linking or syntax correctness. To address these challenges, we propose MCTS-SQL, a novel framework that uses Monte Carlo Tree Search to guide SQL generation through multi-step refinement. Since the light-weight models' weak performance of single-shot prediction, we generate better results through several trials with feedback. However, directly applying MCTS-based methods inevitably leads to significant time and computational overhead. Driven by this issue, we propose a token-level prefix-cache mechanism that stores prior information during iterations, effectively improved the execution speed. Experiments results on the SPIDER and BIRD benchmarks demonstrate the effectiveness of our approach. Using a small open-source Qwen2.5-Coder-1.5B, our method outperforms ChatGPT-3.5. When leveraging a more powerful model Gemini 2.5 to explore the performance upper bound, we achieved results competitive with the SOTA. Our findings demonstrate that even small models can be effectively deployed in practical Text-to-SQL systems with the right strategy.
