FastMCTS: A Simple Sampling Strategy for Data Synthesis
Peiji Li, Kai Lv, Yunfan Shao, Yichuan Ma, Linyang Li, Xiaoqing Zheng, Xipeng Qiu, Qipeng Guo
TL;DR
This paper tackles the inefficiency and lack of step-level supervision in rejection sampling for creating synthetic multi-step reasoning data for LLMs. It introduces FastMCTS, an MCTS-inspired sampling framework with Adaptive Stay Policy, Dynamic Exploration, Reserve Simulation, and robustness via an LLM verifier, enabling step-level supervision and tree-structured data for Branch-DPO. Across English and Chinese math datasets, FastMCTS achieves substantial gains in sampling efficiency (over 30% more correct reasoning paths) and downstream training performance (approximately 3.9% improvement under comparable budgets), while yielding more balanced problem difficulty sampling. The approach also supports leveraging tree-derived data for branch- and step-level optimization, offering a practical, scalable alternative to rejection sampling for high-quality reasoning data.
Abstract
Synthetic high-quality multi-step reasoning data can significantly enhance the performance of large language models on various tasks. However, most existing methods rely on rejection sampling, which generates trajectories independently and suffers from inefficiency and imbalanced sampling across problems of varying difficulty. In this work, we introduce FastMCTS, an innovative data synthesis strategy inspired by Monte Carlo Tree Search. FastMCTS provides a more efficient sampling method for multi-step reasoning data, offering step-level evaluation signals and promoting balanced sampling across problems of different difficulty levels. Experiments on both English and Chinese reasoning datasets demonstrate that FastMCTS generates over 30\% more correct reasoning paths compared to rejection sampling as the number of generated tokens scales up. Furthermore, under comparable synthetic data budgets, models trained on FastMCTS-generated data outperform those trained on rejection sampling data by 3.9\% across multiple benchmarks. As a lightweight sampling strategy, FastMCTS offers a practical and efficient alternative for synthesizing high-quality reasoning data. Our code will be released soon.
