Monte Carlo Tree Search with Reasoning Path Refinement for Small Language Models in Conversational Text-to-NoSQL

Xubang Xiong; Raymond Chi-Wing Wong; Yuanfeng Song

Monte Carlo Tree Search with Reasoning Path Refinement for Small Language Models in Conversational Text-to-NoSQL

Xubang Xiong, Raymond Chi-Wing Wong, Yuanfeng Song

TL;DR

This work tackles the challenge of conversational, multi-turn NoSQL querying by introducing Stage-MCTS, a framework that endows small language models with NoSQL-specific reasoning through Monte Carlo Tree Search–guided data augmentation, progressive supervised fine-tuning, and iterative self-training. It frames NoSQL query generation as a search over executable stages, using stage-augmented Chain-of-Thoughts to produce interpretable, actionable steps, and employs a rule-based reward to steer data collection. The authors construct CoNoSQL, a large cross-domain dataset with over 2,000 dialogues and 150 databases, to evaluate generalization across schemas and domains. Empirical results show Stage-MCTS outperforming state-of-the-art large reasoning models in Execution Value Match by up to 7.93%, with a 7B-parameter backbone achieving performance comparable to larger models, demonstrating that carefully designed reasoning and training strategies can close the gap between small and large models for NoSQL query generation. The approach promises practical impact for accessible, context-aware data exploration in NoSQL ecosystems and lays groundwork for integrating conversational query systems into broader data analytics workflows.

Abstract

NoSQL databases have been widely adopted in big data analytics, geospatial applications, and healthcare services, due to their flexibility and scalability. However, querying NoSQL databases requires specialized technical expertise, creating a high barrier for users. While recent studies have explored text-to-NoSQL problem, they primarily focus on single-turn interactions, ignoring the conversational nature of real-world queries. To bridge this gap, we introduce the Conversational Text-to-NoSQL task, which generates NoSQL queries given a natural language question, a NoSQL database, and the dialogue history. To address this task, we propose Stage-MCTS, a framework that endows small language models (SLMs) with NoSQL-specific reasoning capabilities by formulating query generation as a search problem. The framework employs Monte Carlo Tree Search (MCTS) guided by a rule-based reward to produce stepwise reasoning data, followed by progressive supervised fine-tuning (SFT) and self-training strategies. We further construct CoNoSQL, a cross-domain dataset with over 2,000 dialogues and 150 databases, to support evaluation. Experiments demonstrate that our approach outperforms state-of-the-art large reasoning models, improving execution value match (EVM) accuracy by up to 7.93%.

Monte Carlo Tree Search with Reasoning Path Refinement for Small Language Models in Conversational Text-to-NoSQL

TL;DR

Abstract

Paper Structure (54 sections, 9 equations, 17 figures, 13 tables)

This paper contains 54 sections, 9 equations, 17 figures, 13 tables.

Introduction
Preliminaries
Task
NoSQL Query Languages
Methodology
Overview
Stage-augmented CoT Generation
Framing Text-to-NoSQL as a Search Problem
Stage-augmented Chain-of-Thoughts
Improved MCTS for Data Augmentation
Reward-based Sampling and Path Refinement
Three-Phase Supervised Fine-Tuning
Self-training Pipeline
MCTS-based Test-time Scaling
Framework Verification
...and 39 more sections

Figures (17)

Figure 1: A dialogue from the CoNoSQL dataset. MongoDB is utilized in this scenario as a representative of NoSQL database. $Q_{i}$ represents the user question in turn $i$ and $A_{i}$ refers to the response (i.e., usually a NoSQL query) from the system.
Figure 2: NoSQL query example.
Figure 3: A small language model (SLM) is employed to generate multiple reasoning paths using a rule-based reward model. From these paths, the top-k correct paths are selected via reward-based sampling. If no correct path is yielded in a sampling step, the incorrect paths are refined with the ground-truth. The collected reasoning paths are subsequently used to iteratively train the SLM. Finally, the self-trained SLM is applied to predict the NoSQL query with test-time scaling based on MCTS.
Figure 4: Framing Text-to-NoSQL as a Search Problem
Figure 5: The overview of the pipeline of dataset construction. Data from relational databases is converted to NoSQL-compatible data. Incorrectly transformed data is refined using a dialogue-based RAG approach. The quality control measures, including manual review, are implemented to ensure data quality.
...and 12 more figures

Monte Carlo Tree Search with Reasoning Path Refinement for Small Language Models in Conversational Text-to-NoSQL

TL;DR

Abstract

Monte Carlo Tree Search with Reasoning Path Refinement for Small Language Models in Conversational Text-to-NoSQL

Authors

TL;DR

Abstract

Table of Contents

Figures (17)