Table of Contents
Fetching ...

MCTS-SQL: Light-Weight LLMs can Master the Text-to-SQL through Monte Carlo Tree Search

Shuozhi Yuan, Limin Chen, Miaomiao Yuan, Zhao Jin

TL;DR

The paper tackles the challenge of performing Text-to-SQL with lightweight LLMs by introducing MCTS-SQL, a multi-stage framework that progressively reduces the search space (via a Selector), starts from a strong initial SQL (Direct Generator), and iteratively refines queries using Monte Carlo Tree Search (MCTS-Refiner). It further accelerates inference with a token-level Prefix-Cache that reuses invariant prompt components across iterations. Empirical results on SPIDER and BIRD show that small models can outperform ChatGPT-3.5 and reach competitive accuracy against state-of-the-art with Gemini 2.5, while achieving substantial latency reductions. The work demonstrates a practical path to deploy Text-to-SQL on edge devices without relying on large-scale models or costly APIs.

Abstract

Text-to-SQL is a fundamental yet challenging task in the NLP area, aiming at translating natural language questions into SQL queries. While recent advances in large language models have greatly improved performance, most existing approaches depend on models with tens of billions of parameters or costly APIs, limiting their applicability in resource-constrained environments. For real world, especially on edge devices, it is crucial for Text-to-SQL to ensure cost-effectiveness. Therefore, enabling the light-weight models for Text-to-SQL is of great practical significance. However, smaller LLMs often struggle with complicated user instruction, redundant schema linking or syntax correctness. To address these challenges, we propose MCTS-SQL, a novel framework that uses Monte Carlo Tree Search to guide SQL generation through multi-step refinement. Since the light-weight models' weak performance of single-shot prediction, we generate better results through several trials with feedback. However, directly applying MCTS-based methods inevitably leads to significant time and computational overhead. Driven by this issue, we propose a token-level prefix-cache mechanism that stores prior information during iterations, effectively improved the execution speed. Experiments results on the SPIDER and BIRD benchmarks demonstrate the effectiveness of our approach. Using a small open-source Qwen2.5-Coder-1.5B, our method outperforms ChatGPT-3.5. When leveraging a more powerful model Gemini 2.5 to explore the performance upper bound, we achieved results competitive with the SOTA. Our findings demonstrate that even small models can be effectively deployed in practical Text-to-SQL systems with the right strategy.

MCTS-SQL: Light-Weight LLMs can Master the Text-to-SQL through Monte Carlo Tree Search

TL;DR

The paper tackles the challenge of performing Text-to-SQL with lightweight LLMs by introducing MCTS-SQL, a multi-stage framework that progressively reduces the search space (via a Selector), starts from a strong initial SQL (Direct Generator), and iteratively refines queries using Monte Carlo Tree Search (MCTS-Refiner). It further accelerates inference with a token-level Prefix-Cache that reuses invariant prompt components across iterations. Empirical results on SPIDER and BIRD show that small models can outperform ChatGPT-3.5 and reach competitive accuracy against state-of-the-art with Gemini 2.5, while achieving substantial latency reductions. The work demonstrates a practical path to deploy Text-to-SQL on edge devices without relying on large-scale models or costly APIs.

Abstract

Text-to-SQL is a fundamental yet challenging task in the NLP area, aiming at translating natural language questions into SQL queries. While recent advances in large language models have greatly improved performance, most existing approaches depend on models with tens of billions of parameters or costly APIs, limiting their applicability in resource-constrained environments. For real world, especially on edge devices, it is crucial for Text-to-SQL to ensure cost-effectiveness. Therefore, enabling the light-weight models for Text-to-SQL is of great practical significance. However, smaller LLMs often struggle with complicated user instruction, redundant schema linking or syntax correctness. To address these challenges, we propose MCTS-SQL, a novel framework that uses Monte Carlo Tree Search to guide SQL generation through multi-step refinement. Since the light-weight models' weak performance of single-shot prediction, we generate better results through several trials with feedback. However, directly applying MCTS-based methods inevitably leads to significant time and computational overhead. Driven by this issue, we propose a token-level prefix-cache mechanism that stores prior information during iterations, effectively improved the execution speed. Experiments results on the SPIDER and BIRD benchmarks demonstrate the effectiveness of our approach. Using a small open-source Qwen2.5-Coder-1.5B, our method outperforms ChatGPT-3.5. When leveraging a more powerful model Gemini 2.5 to explore the performance upper bound, we achieved results competitive with the SOTA. Our findings demonstrate that even small models can be effectively deployed in practical Text-to-SQL systems with the right strategy.

Paper Structure

This paper contains 33 sections, 9 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Execution accuracy comparison of MCTS-SQL across some existing methods. The results show that MCTS-SQL significantly enhances the performance of light-weight models, achieving performance comparable to some larger models. And when using Gemini2.5 as the base model, we achieve results competitive with the SOTA.
  • Figure 2: The MCTS-SQL framework consists of three core components: the Selector, the Direct Generator and the MCTS-Refiner. The Selector is used to filter the most relevant tables and columns based on the user's intent. The Direct Generator aims to produce an initial SQL query. And the MCTS-Refiner is activated when the initial SQL query fails both execution and LLM-based verification checks. which adopts iterative trial-and-feedback optimization to refine the query progressively.
  • Figure 3: The main workflow of our proposed MCTS-refiner. The SQL generated in the last step is firstly get a critique. Then, based on the critique, a refinement is provided. The search tree is now expanded.If the iteration is complete, the node with best score is selected as the final output, otherwise, the node will be added to the search tree and backpropagation.
  • Figure 4: Illustration of the proposed optimization strategies. (a) MCTS-based iterative refinement for SQL generation. (b) Prefix-caching mechanism that reuses cached K/V states for invariant prompt components, reducing redundant computation and improving efficiency during multiple iterations.
  • Figure 5: An example of proposed database schema format.The format consists of table names, descriptions and column level details(name, data type, description, and examples) to represent the hierarchical information of databases.
  • ...and 1 more figures