SSR: Speculative Parallel Scaling Reasoning in Test-time

Yuanlin Chu; Bo Wang; Xiang Liu; Hong Chen; Aiwei Liu; Xuming Hu

SSR: Speculative Parallel Scaling Reasoning in Test-time

Yuanlin Chu, Bo Wang, Xiang Liu, Hong Chen, Aiwei Liu, Xuming Hu

TL;DR

The paper tackles the efficiency–accuracy bottleneck of test-time reasoning in large language models performing multi-step math problems. It introduces SSR, a training-free framework that unites Selective Parallel Module (SPM) with Step-level Speculative Decoding (SSD) to accelerate reasoning while preserving correctness. SSR selects a small, diverse set of reasoning strategies and generates candidate steps with a lightweight draft model that the target model can verify or revise, enabling batched parallel inference and adaptive fast modes. Empirical results on AIME 2024, MATH-500, and LiveMathBench show substantial compute reductions (e.g., to $30\%$ of baseline on MATH-500 and $80.5\%$ on LiveMathBench) with little to no loss in accuracy, and in some cases significant accuracy gains (e.g., +13.84% on LiveMathBench), demonstrating broad, training-free effectiveness for efficient, reliable mathematical reasoning.

Abstract

Large language models (LLMs) have achieved impressive results on multi-step mathematical reasoning, yet at the cost of high computational overhead. This challenge is particularly acute for test-time scaling methods such as parallel decoding, which increase answer diversity but scale poorly in efficiency. To address this efficiency-accuracy trade-off, we propose SSR (Speculative Parallel Scaling Reasoning), a training-free framework that leverages a key insight: by introducing speculative decoding at the step level, we can accelerate reasoning without sacrificing correctness. SSR integrates two components: a Selective Parallel Module (SPM) that identifies a small set of promising reasoning strategies via model-internal scoring, and Step-level Speculative Decoding (SSD), which enables efficient draft-target collaboration for fine-grained reasoning acceleration. Experiments on three mathematical benchmarks-AIME 2024, MATH-500, and LiveMathBench - demonstrate that SSR achieves strong gains over baselines. For instance, on LiveMathBench, SSR improves pass@1 accuracy by 13.84% while reducing computation to 80.5% of the baseline FLOPs. On MATH-500, SSR reduces compute to only 30% with no loss in accuracy.

SSR: Speculative Parallel Scaling Reasoning in Test-time

TL;DR

Abstract

SSR: Speculative Parallel Scaling Reasoning in Test-time

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)