Table of Contents
Fetching ...

ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models

Dongqi Zheng

TL;DR

ARS addresses the overthinking inefficiency in large reasoning language models by proposing a training-free adaptive suppression strategy guided by certainty signals across multiple checkpoints. It combines multi-checkpoint certainty estimation, progressive thresholding, and dynamic suppression to prune redundant reasoning steps while preserving accuracy. The approach provides theoretical efficiency guarantees and demonstrates substantial token, latency, and energy reductions (up to 53%, 46.1%, and 57.9%, respectively) with competitive accuracy across math benchmarks and architectures. This work enables more practical deployment of LRLMs for mathematical and other reasoning tasks and suggests directions for extending adaptive certainty mechanisms to broader reasoning paradigms.

Abstract

Large Reasoning Language Models (LRLMs or LRMs) demonstrate remarkable capabilities in complex reasoning tasks, but suffer from significant computational inefficiencies due to overthinking phenomena. Existing efficient reasoning methods face the challenge of balancing reasoning quality with inference cost reduction. We propose \textbf{Adaptive Reasoning Suppression (ARS)}, a novel training-free approach that dynamically suppresses redundant reasoning steps while preserving accuracy through adaptive certainty monitoring. ARS introduces a multi-checkpoint certainty estimation mechanism with progressive suppression thresholds, achieving superior efficiency compared to static suppression methods. Our extensive evaluation across mathematical reasoning benchmarks using multiple model architectures demonstrates that ARS achieves up to 53%, 46.1%, and 57.9% in token, latency and energy reduction, while maintaining or improving accuracy.

ARS: Adaptive Reasoning Suppression for Efficient Large Reasoning Language Models

TL;DR

ARS addresses the overthinking inefficiency in large reasoning language models by proposing a training-free adaptive suppression strategy guided by certainty signals across multiple checkpoints. It combines multi-checkpoint certainty estimation, progressive thresholding, and dynamic suppression to prune redundant reasoning steps while preserving accuracy. The approach provides theoretical efficiency guarantees and demonstrates substantial token, latency, and energy reductions (up to 53%, 46.1%, and 57.9%, respectively) with competitive accuracy across math benchmarks and architectures. This work enables more practical deployment of LRLMs for mathematical and other reasoning tasks and suggests directions for extending adaptive certainty mechanisms to broader reasoning paradigms.

Abstract

Large Reasoning Language Models (LRLMs or LRMs) demonstrate remarkable capabilities in complex reasoning tasks, but suffer from significant computational inefficiencies due to overthinking phenomena. Existing efficient reasoning methods face the challenge of balancing reasoning quality with inference cost reduction. We propose \textbf{Adaptive Reasoning Suppression (ARS)}, a novel training-free approach that dynamically suppresses redundant reasoning steps while preserving accuracy through adaptive certainty monitoring. ARS introduces a multi-checkpoint certainty estimation mechanism with progressive suppression thresholds, achieving superior efficiency compared to static suppression methods. Our extensive evaluation across mathematical reasoning benchmarks using multiple model architectures demonstrates that ARS achieves up to 53%, 46.1%, and 57.9% in token, latency and energy reduction, while maintaining or improving accuracy.

Paper Structure

This paper contains 11 sections, 3 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Performance comparison on GSM8K dataset. ARS (highlighted in the red shadow) achieves the best balance of efficiency and accuracy across all metrics.
  • Figure 2: Performance comparison on MATH500 dataset. ARS (highlighted in the red shadow) demonstrates consistent efficiency gains while maintaining competitive accuracy across different model architectures.
  • Figure 3: Illustration of ARS's effectiveness through a detailed example from the MATH500 dataset showing how different methods handle the same geometric sequence problem.