Table of Contents
Fetching ...

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

Shengyu Feng, Xiang Kong, Shuang Ma, Aonan Zhang, Dong Yin, Chong Wang, Ruoming Pang, Yiming Yang

TL;DR

This work tackles the challenge of enabling reliable multi-step reasoning in LLMs by introducing Twisted Sequential Monte Carlo (TSMC) as a verification mechanism. TSMC uses intermediate twist functions tied to a value function to steer sampling toward promising partial solutions, yielding unbiased, lower-variance estimates of final solution quality without requiring step-by-step human supervision via CTL-based value learning. The method connects to PRMs while mitigating their supervision burden, and empirical results on GSM8K and MATH500 show consistent improvements over traditional verification and non-verification baselines across multiple generators. The approach offers a scalable, supervision-light framework for improving mathematical reasoning in LLMs with practical gains in efficiency and accuracy.

Abstract

Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process supervision, which is costly to acquire. In this paper, we address these limitations by introducing a novel verification method based on Twisted Sequential Monte Carlo (TSMC). TSMC sequentially refines its sampling effort to focus exploration on promising candidates, resulting in more efficient generation of high-quality solutions. We apply TSMC to LLMs by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations. We empirically demonstrate the advantages of our method across multiple math benchmarks, and also validate our theoretical analysis of both our approach and existing verification methods.

Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo

TL;DR

This work tackles the challenge of enabling reliable multi-step reasoning in LLMs by introducing Twisted Sequential Monte Carlo (TSMC) as a verification mechanism. TSMC uses intermediate twist functions tied to a value function to steer sampling toward promising partial solutions, yielding unbiased, lower-variance estimates of final solution quality without requiring step-by-step human supervision via CTL-based value learning. The method connects to PRMs while mitigating their supervision burden, and empirical results on GSM8K and MATH500 show consistent improvements over traditional verification and non-verification baselines across multiple generators. The approach offers a scalable, supervision-light framework for improving mathematical reasoning in LLMs with practical gains in efficiency and accuracy.

Abstract

Augmenting the multi-step reasoning abilities of Large Language Models (LLMs) has been a persistent challenge. Recently, verification has shown promise in improving solution consistency by evaluating generated outputs. However, current verification approaches suffer from sampling inefficiencies, requiring a large number of samples to achieve satisfactory performance. Additionally, training an effective verifier often depends on extensive process supervision, which is costly to acquire. In this paper, we address these limitations by introducing a novel verification method based on Twisted Sequential Monte Carlo (TSMC). TSMC sequentially refines its sampling effort to focus exploration on promising candidates, resulting in more efficient generation of high-quality solutions. We apply TSMC to LLMs by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations. We empirically demonstrate the advantages of our method across multiple math benchmarks, and also validate our theoretical analysis of both our approach and existing verification methods.
Paper Structure (47 sections, 6 theorems, 35 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 47 sections, 6 theorems, 35 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Proposition 3.1

For IS with the target $\sigma(\mathbf{x}_{1:T})$ and proposal $q(\mathbf{x}_{1:T})$, up to a constant $C$ independent of $q(\mathbf{x}_{1:T})$, the following identity in the variance holds for the set of all answers $\mathcal{A}$:

Figures (5)

  • Figure 1: IS-based verification vs. TSMC-based verification. (a) Typical IS-based verification only weights (verifies) the solutions until they are fully generated, which often leads to generating incorrect solutions with high probability, aka low sampling efficiency. (b) Our TSMC-based verification weights and resamples partial solutions at each step of the generation process. This sequential resampling process reduces the discrepancy between the proposal and target distributions, improving the overall correctness of the generated solutions and thus the sampling efficiency.
  • Figure 2: Comparison among all biased and unbiased estimators of the importance weight.
  • Figure 3: TSMC with different intermediate targets. Variance are visualized across many sub-samples of the 240 solutions per problem.
  • Figure 4: Ablation study on the TSMC batch size. Variance are visualized across many sub-samples of the 240 solutions per problem.
  • Figure 5: Comparison with non-sampling algorithms. Variance are visualized across many sub-samples of the 240 solutions per problem.

Theorems & Definitions (10)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition A.1
  • proof
  • Proposition A.1
  • proof
  • Definition A.1
  • Proposition A.1
  • proof