Table of Contents
Fetching ...

The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models

Renfei Dang, Zhening Li, Shujian Huang, Jiajun Chen

TL;DR

It is suggested that excessive attention to the input question serves as a key mechanism through which internal bias influences subsequent reasoning trajectories, and interpretability experiments suggest that excessive attention to the input question serves as a key mechanism through which internal bias influences subsequent reasoning trajectories.

Abstract

Reasoning models often exhibit overthinking, characterized by redundant reasoning steps. We identify \emph{internal bias} elicited by the input question as a key trigger of such behavior. Upon encountering a problem, the model immediately forms a preliminary guess about the answer, which we term an internal bias since it may not be explicitly generated, and it arises without systematic reasoning. When this guess conflicts with its subsequent reasoning, the model tends to engage in excessive reflection, resulting in wasted computation. We validate the association between internal bias and overthinking across multiple models and diverse reasoning tasks. To demonstrate the causal relationship more rigorously, we conduct two counterfactual interventions, showing that removing the input question after the model reduces the redundant reasoning across various complex reasoning tasks, and manually injecting bias affects overthinking accordingly. Further interpretability experiments suggest that excessive attention to the input question serves as a key mechanism through which internal bias influences subsequent reasoning trajectories. Finally, we evaluated several methods aimed at mitigating overthinking, yet the influence of internal bias persisted under all conditions.

The First Impression Problem: Internal Bias Triggers Overthinking in Reasoning Models

TL;DR

It is suggested that excessive attention to the input question serves as a key mechanism through which internal bias influences subsequent reasoning trajectories, and interpretability experiments suggest that excessive attention to the input question serves as a key mechanism through which internal bias influences subsequent reasoning trajectories.

Abstract

Reasoning models often exhibit overthinking, characterized by redundant reasoning steps. We identify \emph{internal bias} elicited by the input question as a key trigger of such behavior. Upon encountering a problem, the model immediately forms a preliminary guess about the answer, which we term an internal bias since it may not be explicitly generated, and it arises without systematic reasoning. When this guess conflicts with its subsequent reasoning, the model tends to engage in excessive reflection, resulting in wasted computation. We validate the association between internal bias and overthinking across multiple models and diverse reasoning tasks. To demonstrate the causal relationship more rigorously, we conduct two counterfactual interventions, showing that removing the input question after the model reduces the redundant reasoning across various complex reasoning tasks, and manually injecting bias affects overthinking accordingly. Further interpretability experiments suggest that excessive attention to the input question serves as a key mechanism through which internal bias influences subsequent reasoning trajectories. Finally, we evaluated several methods aimed at mitigating overthinking, yet the influence of internal bias persisted under all conditions.

Paper Structure

This paper contains 53 sections, 2 equations, 19 figures, 12 tables.

Figures (19)

  • Figure 1: Two examples illustrating the existence of internal bias. Green texts denote correct answers derived from reasoning, while the navy-colored portions show the influence of internal bias. In the left example (a simpler case), the model develops an internal bias of "2", conflicting with the reasoning result "3". In the more complex example (from AIME 2024) on the right, the model predicts the answer to be approximately "20", significantly deviating from the correct value "211" obtained via reasoning. For both examples, the reasoning process are manually separated into chunks for better illustration, where the model obtains the correct answer in the first chunk, but internal bias still triggers later reflection. The number in the bottom-right corner of each chunk indicates the length of it. It is clear that the model spends much more tokens in reflection due to internal bias.
  • Figure 2: Correlation between deviation degree and reasoning behavior. The light-colored bars represent the full reasoning length, while the dark-colored bars indicate the position at which the model first provides an answer. The orange line shows the number of reflection keywords. Qwen-14B here is short for R1-distill-Qwen-14B. Appendix \ref{['findfirstanswer']} describes the method used to identify the position at which the model first provides an answer during its reasoning process.
  • Figure 3: The "strawberry" example as an illustration of abnormally high attention scores on question part when a reflection token is about to be output. Color intensity is used to represent the ratio of attention scores assigned to each preceding tokens at the following two steps: generating "three" and generating "Wait". Darker red indicates a higher relative attention at the reflection point when "Wait" is generated, while darker blue reflects higher relative attention when generating "three".
  • Figure 4: (a) Group-level scores $S_G^c$ with averaged attention scores from layers 21 to 30. Similar visualizations for other layers are in Appendix \ref{['moreAttentionAnalysis']} and the trends are the same. (b) The ratio of $S_\text{Reflection}^c/S_\text{Other}^c$ across all layers.
  • Figure 5: Results on KnowLogic of R1-Distill-Qwen-14B. The model performs poorly on the dataset, with inconsistency rates exceeding 75% in over 80% of the cases, leading to the abnormal last bar. But the first three bars still exhibit the expected trend.
  • ...and 14 more figures