Table of Contents
Fetching ...

SSR: Speculative Parallel Scaling Reasoning in Test-time

Yuanlin Chu, Bo Wang, Xiang Liu, Hong Chen, Aiwei Liu, Xuming Hu

TL;DR

The paper tackles the efficiency–accuracy bottleneck of test-time reasoning in large language models performing multi-step math problems. It introduces SSR, a training-free framework that unites Selective Parallel Module (SPM) with Step-level Speculative Decoding (SSD) to accelerate reasoning while preserving correctness. SSR selects a small, diverse set of reasoning strategies and generates candidate steps with a lightweight draft model that the target model can verify or revise, enabling batched parallel inference and adaptive fast modes. Empirical results on AIME 2024, MATH-500, and LiveMathBench show substantial compute reductions (e.g., to $30\%$ of baseline on MATH-500 and $80.5\%$ on LiveMathBench) with little to no loss in accuracy, and in some cases significant accuracy gains (e.g., +13.84% on LiveMathBench), demonstrating broad, training-free effectiveness for efficient, reliable mathematical reasoning.

Abstract

Large language models (LLMs) have achieved impressive results on multi-step mathematical reasoning, yet at the cost of high computational overhead. This challenge is particularly acute for test-time scaling methods such as parallel decoding, which increase answer diversity but scale poorly in efficiency. To address this efficiency-accuracy trade-off, we propose SSR (Speculative Parallel Scaling Reasoning), a training-free framework that leverages a key insight: by introducing speculative decoding at the step level, we can accelerate reasoning without sacrificing correctness. SSR integrates two components: a Selective Parallel Module (SPM) that identifies a small set of promising reasoning strategies via model-internal scoring, and Step-level Speculative Decoding (SSD), which enables efficient draft-target collaboration for fine-grained reasoning acceleration. Experiments on three mathematical benchmarks-AIME 2024, MATH-500, and LiveMathBench - demonstrate that SSR achieves strong gains over baselines. For instance, on LiveMathBench, SSR improves pass@1 accuracy by 13.84% while reducing computation to 80.5% of the baseline FLOPs. On MATH-500, SSR reduces compute to only 30% with no loss in accuracy.

SSR: Speculative Parallel Scaling Reasoning in Test-time

TL;DR

The paper tackles the efficiency–accuracy bottleneck of test-time reasoning in large language models performing multi-step math problems. It introduces SSR, a training-free framework that unites Selective Parallel Module (SPM) with Step-level Speculative Decoding (SSD) to accelerate reasoning while preserving correctness. SSR selects a small, diverse set of reasoning strategies and generates candidate steps with a lightweight draft model that the target model can verify or revise, enabling batched parallel inference and adaptive fast modes. Empirical results on AIME 2024, MATH-500, and LiveMathBench show substantial compute reductions (e.g., to of baseline on MATH-500 and on LiveMathBench) with little to no loss in accuracy, and in some cases significant accuracy gains (e.g., +13.84% on LiveMathBench), demonstrating broad, training-free effectiveness for efficient, reliable mathematical reasoning.

Abstract

Large language models (LLMs) have achieved impressive results on multi-step mathematical reasoning, yet at the cost of high computational overhead. This challenge is particularly acute for test-time scaling methods such as parallel decoding, which increase answer diversity but scale poorly in efficiency. To address this efficiency-accuracy trade-off, we propose SSR (Speculative Parallel Scaling Reasoning), a training-free framework that leverages a key insight: by introducing speculative decoding at the step level, we can accelerate reasoning without sacrificing correctness. SSR integrates two components: a Selective Parallel Module (SPM) that identifies a small set of promising reasoning strategies via model-internal scoring, and Step-level Speculative Decoding (SSD), which enables efficient draft-target collaboration for fine-grained reasoning acceleration. Experiments on three mathematical benchmarks-AIME 2024, MATH-500, and LiveMathBench - demonstrate that SSR achieves strong gains over baselines. For instance, on LiveMathBench, SSR improves pass@1 accuracy by 13.84% while reducing computation to 80.5% of the baseline FLOPs. On MATH-500, SSR reduces compute to only 30% with no loss in accuracy.

Paper Structure

This paper contains 44 sections, 11 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of SSR framework. At test time, the target model first selects a subset of strategies from a curated strategy pool. For each selected method prompt, the draft model generates step-by-step reasoning. Each step is validated and optionally revised by the target model under a step-level speculative decoding scheme. All inference is executed in parallel with batched processing for efficiency.
  • Figure 2: Accuracy vs. number of parallel reasoning paths on AIME, Math-500 and LiveMathBench using Qwen/QwQ-32B. Accuracy gains plateau as parallel count increases, indicating diminishing returns.
  • Figure 3: Comparison of different inference strategies on AIME2024, MATH-500, and LiveMathBench. Each sub-plot shows the trade-off between Computational Efficiency (x-axis: inverse of normalized FLOPs; higher is better) and Accuracy (y-axis: pass@1, higher is better). Points closer to the top-right corner represent more desirable performance.
  • Figure 4: Ablation on the effect of Selective Parallel Module (SPM) across datasets. Each group of bars compares Baseline, Parallel, and Parallel-SPM settings (all without SSD, with $N=5$).
  • Figure 5: Step score distribution (0–9) across AIME, MATH, and LiveMathBench, using our SSD strategy. Bars represent per-score proportions; the overlaid curve shows the cumulative distribution.