Table of Contents
Fetching ...

FastMCTS: A Simple Sampling Strategy for Data Synthesis

Peiji Li, Kai Lv, Yunfan Shao, Yichuan Ma, Linyang Li, Xiaoqing Zheng, Xipeng Qiu, Qipeng Guo

TL;DR

This paper tackles the inefficiency and lack of step-level supervision in rejection sampling for creating synthetic multi-step reasoning data for LLMs. It introduces FastMCTS, an MCTS-inspired sampling framework with Adaptive Stay Policy, Dynamic Exploration, Reserve Simulation, and robustness via an LLM verifier, enabling step-level supervision and tree-structured data for Branch-DPO. Across English and Chinese math datasets, FastMCTS achieves substantial gains in sampling efficiency (over 30% more correct reasoning paths) and downstream training performance (approximately 3.9% improvement under comparable budgets), while yielding more balanced problem difficulty sampling. The approach also supports leveraging tree-derived data for branch- and step-level optimization, offering a practical, scalable alternative to rejection sampling for high-quality reasoning data.

Abstract

Synthetic high-quality multi-step reasoning data can significantly enhance the performance of large language models on various tasks. However, most existing methods rely on rejection sampling, which generates trajectories independently and suffers from inefficiency and imbalanced sampling across problems of varying difficulty. In this work, we introduce FastMCTS, an innovative data synthesis strategy inspired by Monte Carlo Tree Search. FastMCTS provides a more efficient sampling method for multi-step reasoning data, offering step-level evaluation signals and promoting balanced sampling across problems of different difficulty levels. Experiments on both English and Chinese reasoning datasets demonstrate that FastMCTS generates over 30\% more correct reasoning paths compared to rejection sampling as the number of generated tokens scales up. Furthermore, under comparable synthetic data budgets, models trained on FastMCTS-generated data outperform those trained on rejection sampling data by 3.9\% across multiple benchmarks. As a lightweight sampling strategy, FastMCTS offers a practical and efficient alternative for synthesizing high-quality reasoning data. Our code will be released soon.

FastMCTS: A Simple Sampling Strategy for Data Synthesis

TL;DR

This paper tackles the inefficiency and lack of step-level supervision in rejection sampling for creating synthetic multi-step reasoning data for LLMs. It introduces FastMCTS, an MCTS-inspired sampling framework with Adaptive Stay Policy, Dynamic Exploration, Reserve Simulation, and robustness via an LLM verifier, enabling step-level supervision and tree-structured data for Branch-DPO. Across English and Chinese math datasets, FastMCTS achieves substantial gains in sampling efficiency (over 30% more correct reasoning paths) and downstream training performance (approximately 3.9% improvement under comparable budgets), while yielding more balanced problem difficulty sampling. The approach also supports leveraging tree-derived data for branch- and step-level optimization, offering a practical, scalable alternative to rejection sampling for high-quality reasoning data.

Abstract

Synthetic high-quality multi-step reasoning data can significantly enhance the performance of large language models on various tasks. However, most existing methods rely on rejection sampling, which generates trajectories independently and suffers from inefficiency and imbalanced sampling across problems of varying difficulty. In this work, we introduce FastMCTS, an innovative data synthesis strategy inspired by Monte Carlo Tree Search. FastMCTS provides a more efficient sampling method for multi-step reasoning data, offering step-level evaluation signals and promoting balanced sampling across problems of different difficulty levels. Experiments on both English and Chinese reasoning datasets demonstrate that FastMCTS generates over 30\% more correct reasoning paths compared to rejection sampling as the number of generated tokens scales up. Furthermore, under comparable synthetic data budgets, models trained on FastMCTS-generated data outperform those trained on rejection sampling data by 3.9\% across multiple benchmarks. As a lightweight sampling strategy, FastMCTS offers a practical and efficient alternative for synthesizing high-quality reasoning data. Our code will be released soon.

Paper Structure

This paper contains 36 sections, 8 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Comparison of generation efficiency of three sampling algorithms. "#Verified Tokens" represents the total tokens in all verified correct trajectories.
  • Figure 2: The overview of one iteration of FastMCTS
  • Figure 3: Comparison of sampling efficiency for FastMCTS and Rejection Sampling.
  • Figure 4: Comparison of sampling balance across difficulty levels for Rejection Sampling and FastMCTS.
  • Figure 5: Example of the Prompt Template Used for Model Evaluation