Think How to Think: Mitigating Overthinking with Autonomous Difficulty Cognition in Large Reasoning Models
Yongjiang Liu, Haoxi Li, Xiaosong Ma, Jie Zhang, Song Guo
TL;DR
The paper tackles overthinking in large reasoning models by introducing TH2T, a two-stage fine-tuning framework that injects Difficulty Cognition and Redundancy Cognition via internal self-hypnosis prompts. Stage 1 calibrates global reasoning depth using a hybrid easy/hard dataset, while Stage 2 trims in-progress reasoning by detecting and truncating redundant or looping steps. Empirical results across 7B/14B/32B models on GSM8K, MATH, AIME2024, Omni-MATH, and GPQA show substantial reductions in token length (up to ~70% on easy tasks and ~40% on hard tasks) with stable accuracy, along with improved difficulty-awareness and reduced reflective/looping patterns. The approach shifts from prompt-based controls to endogenous metacognitive cues, enabling more efficient and robust reasoning suitable for real-world deployment, with ablations highlighting the necessity of both stages.
Abstract
Recent Large Reasoning Models (LRMs) excel at complex reasoning tasks but often suffer from overthinking, generating overly long and redundant reasoning trajectories. To explore its essence, our empirical analysis reveals that LRMs are primarily limited to recognizing task properties (i.e., difficulty levels) like humans before solving the problem, leading to a one-size-fits-all reasoning process. Inspired by this, a pressing and natural question emerges: Can we explicitly bootstrap such ability to alleviate overthinking in LRMs? In this paper, we propose Think-How-to-Think (TH2T), a novel two-stage fine-tuning strategy that progressively inspires LRMs' difficulty cognition and redundancy cognition of LRMs. Specifically, we first inject difficulty hypnosis into output prefixes to guide the model toward adaptive reasoning depth, trained on a hybrid dataset mixing short and long reasoning paths. Then, we incorporate redundancy hypnosis, which supervises the intermediate reasoning steps to identify and eliminate unnecessary reasoning patterns. Experiments on 7B/14B/32B models demonstrate that TH2T significantly reduces inference costs by over 70% on easy tasks and 40% on hard tasks while maintaining performance stability. The resulting outputs exhibit clear signs of difficulty-aware capabilities and reduced redundancy (e.g., reflection and looping).
