Table of Contents
Fetching ...

Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

Doohyuk Jang, Yoonjeon Kim, Chanjae Park, Hyun Ryu, Eunho Yang

TL;DR

The paper identifies reasoning rigidity in large language models, where explicit user constraints are overridden in favor of ingrained reasoning templates, compromising correctness in math and logic tasks. It introduces ReasoningTrap, a diagnostic dataset consisting of ConditionedMath and PuzzleTrivial to stress adherence to user instructions, plus an automated Contamination Ratio metric and a p-pass@k evaluation to separate perception from final correctness. Empirical results show base models often outperform reasoning-tuned variants on key adherence metrics, while more advanced models exhibit stronger contamination with longer reasoning, motivating mitigation via problem restatement and targeted prompts. The work provides a public diagnostic resource and analysis framework to advance faithful reasoning in LLMs, with implications for robust instruction-following in complex reasoning tasks.

Abstract

Large language models have demonstrated remarkable proficiency in long and complex reasoning tasks. However, they frequently exhibit a problematic reliance on familiar reasoning patterns, a phenomenon we term \textit{reasoning rigidity}. Despite explicit instructions from users, these models often override clearly stated conditions and default to habitual reasoning trajectories, leading to incorrect conclusions. This behavior presents significant challenges, particularly in domains such as mathematics and logic puzzle, where precise adherence to specified constraints is critical. To systematically investigate reasoning rigidity, a behavior largely unexplored in prior work, we introduce a expert-curated diagnostic set, \dataset{}. Our dataset includes specially modified variants of existing mathematical benchmarks, namely AIME and MATH500, as well as well-known puzzles deliberately redesigned to require deviation from familiar reasoning strategies. Using this dataset, we identify recurring contamination patterns that occur when models default to ingrained reasoning. Specifically, we categorize this contamination into three distinctive modes: (i) Interpretation Overload, (ii) Input Distrust, and (iii) Partial Instruction Attention, each causing models to ignore or distort provided instructions. We publicly release our diagnostic set to facilitate future research on mitigating reasoning rigidity in language models.

Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

TL;DR

The paper identifies reasoning rigidity in large language models, where explicit user constraints are overridden in favor of ingrained reasoning templates, compromising correctness in math and logic tasks. It introduces ReasoningTrap, a diagnostic dataset consisting of ConditionedMath and PuzzleTrivial to stress adherence to user instructions, plus an automated Contamination Ratio metric and a p-pass@k evaluation to separate perception from final correctness. Empirical results show base models often outperform reasoning-tuned variants on key adherence metrics, while more advanced models exhibit stronger contamination with longer reasoning, motivating mitigation via problem restatement and targeted prompts. The work provides a public diagnostic resource and analysis framework to advance faithful reasoning in LLMs, with implications for robust instruction-following in complex reasoning tasks.

Abstract

Large language models have demonstrated remarkable proficiency in long and complex reasoning tasks. However, they frequently exhibit a problematic reliance on familiar reasoning patterns, a phenomenon we term \textit{reasoning rigidity}. Despite explicit instructions from users, these models often override clearly stated conditions and default to habitual reasoning trajectories, leading to incorrect conclusions. This behavior presents significant challenges, particularly in domains such as mathematics and logic puzzle, where precise adherence to specified constraints is critical. To systematically investigate reasoning rigidity, a behavior largely unexplored in prior work, we introduce a expert-curated diagnostic set, \dataset{}. Our dataset includes specially modified variants of existing mathematical benchmarks, namely AIME and MATH500, as well as well-known puzzles deliberately redesigned to require deviation from familiar reasoning strategies. Using this dataset, we identify recurring contamination patterns that occur when models default to ingrained reasoning. Specifically, we categorize this contamination into three distinctive modes: (i) Interpretation Overload, (ii) Input Distrust, and (iii) Partial Instruction Attention, each causing models to ignore or distort provided instructions. We publicly release our diagnostic set to facilitate future research on mitigating reasoning rigidity in language models.

Paper Structure

This paper contains 28 sections, 2 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Reasoning Rigidity in Well-Known Math Problem and Logic Puzzle. When solving a subtly modified version of a well-known math problems (AIME) and famous logic puzzles (Fibonacci Rabbit and Tower of Hanoi), advanced reasoning models such as Qwen3-32B and OpenAI o3 default to familiar reasoning template leading to incorrect conclusions.
  • Figure 2: Dataset Construction Pipeline The dataset construction pipeline of ConditionedMath consists of two steps. Step1: Create new questions with unusual conditions that are (1) valid, (2) meaningfully different from the original, and (3) solvable without ambiguity. Two modified versions of a card-guessing problem are shown. While Modif 1 introduces a small tweak that preserves validity and solvability, Modif 2 includes an invalid condition (multiplying a card count by –3), rendering the problem unsolvable. (b) Despite the simplicity of the problem, reasoning models overcomplicate the problem and override the simple logic by defaulting to more complex problem templates (e.g., assuming a two-card setup).
  • Figure 3: Patterns Associated with Contamination Ratio (a) Relationship between contamination ratio and p-pass@1 reveals that contamination in the reasoning path does not affect the final output up to certain point (approximately $40\%$), while contamination over this point drastically reduces the p-pass@1 score, indicating that the model is trapped into a wrongful reasoning path and arrived at incorrect output. (b) Observing the contamination ratio between specific interval of reasoning steps, wrong output reasoning exhibits progressively worsening contamination as the reasoning step length increases.
  • Figure 4: Reasoning Pattern Analysis and Corresponding Prompt Hinting.
  • Figure 5: ConditionedMath (MATH500) sample problems
  • ...and 2 more figures