Table of Contents
Fetching ...

On Reasoning Strength Planning in Large Reasoning Models

Leheng Sheng, An Zhang, Zijian Wu, Weixiang Zhao, Changshuo Shen, Yi Zhang, Xiang Wang, Tat-Seng Chua

TL;DR

This work investigates whether large reasoning models pre-plan their reasoning strength before generating answers. Using linear probes on question activations, the authors show reasoning length can be predicted with a correlation of about 0.84, suggesting advance planning. They uncover pre-allocated direction vectors in activation space whose magnitude encodes the planned reasoning length and demonstrate their causal influence by steering activations to modulate the end-of-thinking token logits, thereby adjusting reasoning length and, in some cases, task performance. The findings enable potential applications in overthink detection prior to generation and in achieving more efficient inference, while outlining limitations and avenues for future research.

Abstract

Recent studies empirically reveal that large reasoning models (LRMs) can automatically allocate more reasoning strengths (i.e., the number of reasoning tokens) for harder problems, exhibiting difficulty-awareness for better task performance. While this automatic reasoning strength allocation phenomenon has been widely observed, its underlying mechanism remains largely unexplored. To this end, we provide explanations for this phenomenon from the perspective of model activations. We find evidence that LRMs pre-plan the reasoning strengths in their activations even before generation, with this reasoning strength causally controlled by the magnitude of a pre-allocated directional vector. Specifically, we show that the number of reasoning tokens is predictable solely based on the question activations using linear probes, indicating that LRMs estimate the required reasoning strength in advance. We then uncover that LRMs encode this reasoning strength through a pre-allocated directional vector embedded in the activations of the model, where the vector's magnitude modulates the reasoning strength. Subtracting this vector can lead to reduced reasoning token number and performance, while adding this vector can lead to increased reasoning token number and even improved performance. We further reveal that this direction vector consistently yields positive reasoning length prediction, and it modifies the logits of end-of-reasoning token </think> to affect the reasoning length. Finally, we demonstrate two potential applications of our findings: overthinking behavior detection and enabling efficient reasoning on simple problems. Our work provides new insights into the internal mechanisms of reasoning in LRMs and offers practical tools for controlling their reasoning behaviors. Our code is available at https://github.com/AlphaLab-USTC/LRM-plans-CoT.

On Reasoning Strength Planning in Large Reasoning Models

TL;DR

This work investigates whether large reasoning models pre-plan their reasoning strength before generating answers. Using linear probes on question activations, the authors show reasoning length can be predicted with a correlation of about 0.84, suggesting advance planning. They uncover pre-allocated direction vectors in activation space whose magnitude encodes the planned reasoning length and demonstrate their causal influence by steering activations to modulate the end-of-thinking token logits, thereby adjusting reasoning length and, in some cases, task performance. The findings enable potential applications in overthink detection prior to generation and in achieving more efficient inference, while outlining limitations and avenues for future research.

Abstract

Recent studies empirically reveal that large reasoning models (LRMs) can automatically allocate more reasoning strengths (i.e., the number of reasoning tokens) for harder problems, exhibiting difficulty-awareness for better task performance. While this automatic reasoning strength allocation phenomenon has been widely observed, its underlying mechanism remains largely unexplored. To this end, we provide explanations for this phenomenon from the perspective of model activations. We find evidence that LRMs pre-plan the reasoning strengths in their activations even before generation, with this reasoning strength causally controlled by the magnitude of a pre-allocated directional vector. Specifically, we show that the number of reasoning tokens is predictable solely based on the question activations using linear probes, indicating that LRMs estimate the required reasoning strength in advance. We then uncover that LRMs encode this reasoning strength through a pre-allocated directional vector embedded in the activations of the model, where the vector's magnitude modulates the reasoning strength. Subtracting this vector can lead to reduced reasoning token number and performance, while adding this vector can lead to increased reasoning token number and even improved performance. We further reveal that this direction vector consistently yields positive reasoning length prediction, and it modifies the logits of end-of-reasoning token </think> to affect the reasoning length. Finally, we demonstrate two potential applications of our findings: overthinking behavior detection and enabling efficient reasoning on simple problems. Our work provides new insights into the internal mechanisms of reasoning in LRMs and offers practical tools for controlling their reasoning behaviors. Our code is available at https://github.com/AlphaLab-USTC/LRM-plans-CoT.

Paper Structure

This paper contains 37 sections, 4 equations, 24 figures, 10 tables.

Figures (24)

  • Figure 1: (\ref{['fig:teaser-prediction']}) The reasoning length is predictable before the generation of the first reasoning token. (\ref{['fig:teaser-pca']}) The activations of questions shift towards a pre-allocated direction as difficulty increases. Orange stars denote mean activations of different difficulty levels. (\ref{['fig:teaser-steering-effect']}) Steering activations of LRMs with this direction vector can causally affect the reasoning token numbers, thereby affecting the performance.
  • Figure 2: Layer-wise linear regression results
  • Figure 3: Cosine similarity between pre-allocated vectors across different difficulties. These vectors exhibit extremely high cosine similarities, indicating LRMs pre-allocate single direction vector for distinguishing different question difficulties.
  • Figure 4: Layer-wise cosine similarities between four pre-allocated vectors
  • Figure 5: L2 norms of four pre-allocated vectors. The norm becomes bigger as the difficulty increases.
  • ...and 19 more figures