Table of Contents
Fetching ...

Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models

Yongjiang Liu, Haoxi Li, Xiaosong Ma, Jie Zhang, Song Guo

TL;DR

The paper tackles overthinking in large reasoning models by introducing TH2T, a two-stage fine-tuning framework that injects Difficulty Cognition and Redundancy Cognition via internal self-hypnosis prompts. Stage 1 calibrates global reasoning depth using a hybrid easy/hard dataset, while Stage 2 trims in-progress reasoning by detecting and truncating redundant or looping steps. Empirical results across 7B/14B/32B models on GSM8K, MATH, AIME2024, Omni-MATH, and GPQA show substantial reductions in token length (up to ~70% on easy tasks and ~40% on hard tasks) with stable accuracy, along with improved difficulty-awareness and reduced reflective/looping patterns. The approach shifts from prompt-based controls to endogenous metacognitive cues, enabling more efficient and robust reasoning suitable for real-world deployment, with ablations highlighting the necessity of both stages.

Abstract

Recent Large Reasoning Models (LRMs) excel at complex reasoning tasks but often suffer from overthinking, generating overly long and redundant reasoning trajectories. To explore its essence, our empirical analysis reveals that LRMs are primarily limited to recognizing task properties (i.e., difficulty levels) like humans before solving the problem, leading to a one-size-fits-all reasoning process. Inspired by this, a pressing and natural question emerges: Can we explicitly bootstrap such ability to alleviate overthinking in LRMs? In this paper, we propose Think-How-to-Think (TH2T), a novel two-stage fine-tuning strategy that progressively inspires LRMs' difficulty cognition and redundancy cognition of LRMs. Specifically, we first inject difficulty hypnosis into output prefixes to guide the model toward adaptive reasoning depth, trained on a hybrid dataset mixing short and long reasoning paths. Then, we incorporate redundancy hypnosis, which supervises the intermediate reasoning steps to identify and eliminate unnecessary reasoning patterns. Experiments on 7B/14B/32B models demonstrate that TH2T significantly reduces inference costs by over 70% on easy tasks and 40% on hard tasks while maintaining performance stability. The resulting outputs exhibit clear signs of difficulty-aware capabilities and reduced redundancy (e.g., reflection and looping).

Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models

TL;DR

The paper tackles overthinking in large reasoning models by introducing TH2T, a two-stage fine-tuning framework that injects Difficulty Cognition and Redundancy Cognition via internal self-hypnosis prompts. Stage 1 calibrates global reasoning depth using a hybrid easy/hard dataset, while Stage 2 trims in-progress reasoning by detecting and truncating redundant or looping steps. Empirical results across 7B/14B/32B models on GSM8K, MATH, AIME2024, Omni-MATH, and GPQA show substantial reductions in token length (up to ~70% on easy tasks and ~40% on hard tasks) with stable accuracy, along with improved difficulty-awareness and reduced reflective/looping patterns. The approach shifts from prompt-based controls to endogenous metacognitive cues, enabling more efficient and robust reasoning suitable for real-world deployment, with ablations highlighting the necessity of both stages.

Abstract

Recent Large Reasoning Models (LRMs) excel at complex reasoning tasks but often suffer from overthinking, generating overly long and redundant reasoning trajectories. To explore its essence, our empirical analysis reveals that LRMs are primarily limited to recognizing task properties (i.e., difficulty levels) like humans before solving the problem, leading to a one-size-fits-all reasoning process. Inspired by this, a pressing and natural question emerges: Can we explicitly bootstrap such ability to alleviate overthinking in LRMs? In this paper, we propose Think-How-to-Think (TH2T), a novel two-stage fine-tuning strategy that progressively inspires LRMs' difficulty cognition and redundancy cognition of LRMs. Specifically, we first inject difficulty hypnosis into output prefixes to guide the model toward adaptive reasoning depth, trained on a hybrid dataset mixing short and long reasoning paths. Then, we incorporate redundancy hypnosis, which supervises the intermediate reasoning steps to identify and eliminate unnecessary reasoning patterns. Experiments on 7B/14B/32B models demonstrate that TH2T significantly reduces inference costs by over 70% on easy tasks and 40% on hard tasks while maintaining performance stability. The resulting outputs exhibit clear signs of difficulty-aware capabilities and reduced redundancy (e.g., reflection and looping).

Paper Structure

This paper contains 30 sections, 9 equations, 18 figures, 4 tables.

Figures (18)

  • Figure 1: Comparison with baselines on R1-Distill-Qwen series models. Relative accuracy and token length of original LRM are set as 100%.
  • Figure 2: LRM's difficulty cognition. LRM is prompted to assess the difficulty level of questions from the relatively straightforward GSM8K cobbe2021trainingverifierssolvemath and highly complex tasks (MATH hendrycksmath2021, AIME2024 and OmniMath gao2024omnimathuniversalolympiadlevel). Our method substantially reduces difficulty cognition conflation, consistently across both 3-option (e.g., Easy, Medium, Hard) and 2-option (e.g., Easy, Hard) setups.
  • Figure 3: Internal self-hypnosis vs. external difficulty reminder for response length regulation. Our internal self-hypnosis instills a robust self-regulatory mechanism, demonstrating superior effectiveness, reliability, and instruction-following capabilities compared to conventional external prompts.
  • Figure 4: Framework of TH2T. Stage 1: fine-tune with difficulty-differentiated data with injected difficulty-hypnosis, providing global, prospective signals for strategy selection. Stage 2: fine-tune with truncation and redundancy-hypnosis injection, providing local, retrospective signals for in-process intervention. Inference: autonomous difficulty and redundancy adaptation under the intervention of native self-hypnosis.
  • Figure 5: Statistics of length distribution on GSM8K. Our approach eliminates longer responses, especially repetitive ones that reach maximum generation length, presenting more efficiency in terms of redundancy.
  • ...and 13 more figures