Table of Contents
Fetching ...

Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Models

Ik-hwan Kim, Hyeongrok Han, Mingi Jung, Sangwon Yu, Jinseok Hong, Sang Hun Kim, Yoonyoung Choi, Sungroh Yoon

TL;DR

By effectively eliminating reasoning collapse, MBT achieves higher accuracy with significantly reduced token consumption, demonstrating that internalizing metacognitive strategies leads to more stable and robust reasoning.

Abstract

Large Reasoning Models (LRMs) often exhibit structural fragility in complex reasoning tasks, failing to produce correct answers even after successfully deriving valid intermediate steps. Through systematic analysis, we observe that these failures frequently stem not from a lack of reasoning capacity, but from a deficiency in self-regulatory control, where valid logic is destabilized by uncontrolled exploration or the failure to recognize logical sufficiency. Motivated by this observation, we propose Metacognitive Behavioral Tuning (MBT), a post-training framework that explicitly injects metacognitive behaviors into the model's thought process. MBT implements this via two complementary formulations: (1) MBT-S, which synthesizes rigorous reasoning traces from scratch, and (2) MBT-R, which rewrites the student's initial traces to stabilize intrinsic exploration patterns. Experiments across multi-hop QA benchmarks demonstrate that MBT consistently outperforms baselines, achieving notable gains on challenging benchmarks. By effectively eliminating reasoning collapse, MBT achieves higher accuracy with significantly reduced token consumption, demonstrating that internalizing metacognitive strategies leads to more stable and robust reasoning.

Mirroring the Mind: Distilling Human-Like Metacognitive Strategies into Large Language Models

TL;DR

By effectively eliminating reasoning collapse, MBT achieves higher accuracy with significantly reduced token consumption, demonstrating that internalizing metacognitive strategies leads to more stable and robust reasoning.

Abstract

Large Reasoning Models (LRMs) often exhibit structural fragility in complex reasoning tasks, failing to produce correct answers even after successfully deriving valid intermediate steps. Through systematic analysis, we observe that these failures frequently stem not from a lack of reasoning capacity, but from a deficiency in self-regulatory control, where valid logic is destabilized by uncontrolled exploration or the failure to recognize logical sufficiency. Motivated by this observation, we propose Metacognitive Behavioral Tuning (MBT), a post-training framework that explicitly injects metacognitive behaviors into the model's thought process. MBT implements this via two complementary formulations: (1) MBT-S, which synthesizes rigorous reasoning traces from scratch, and (2) MBT-R, which rewrites the student's initial traces to stabilize intrinsic exploration patterns. Experiments across multi-hop QA benchmarks demonstrate that MBT consistently outperforms baselines, achieving notable gains on challenging benchmarks. By effectively eliminating reasoning collapse, MBT achieves higher accuracy with significantly reduced token consumption, demonstrating that internalizing metacognitive strategies leads to more stable and robust reasoning.
Paper Structure (84 sections, 9 equations, 16 figures, 12 tables)

This paper contains 84 sections, 9 equations, 16 figures, 12 tables.

Figures (16)

  • Figure 1: A representative reasoning trace generated by Qwen3-4B. Despite correctly deriving the answer in intermediate steps, the model ultimately discards it due to an unverified self-imposed constraint. This illustrates a critical deficiency in metacognitive monitoring, where valid logic is overridden by uncontrolled exploration rather than a lack of reasoning capacity.
  • Figure 2: Prevalence of Answer-Inclusive Errors across MHQA benchmarks using Qwen3-8B. Incorrect predictions are stratified into Answer-Inclusive (derived but discarded) and Answer-Exclusive (never derived). The substantial proportion of Answer-Inclusive errors reveals that performance gaps frequently stem from a failure to monitor and preserve correct intermediate conclusions, rather than insufficient knowledge or retrieval capacity.
  • Figure 3: The overall framework of Metacognitive Behavioral Tuning (MBT). We employ two complementary strategies for behavior injection: MBT-S synthesizes rigorous traces from scratch, while MBT-R rewrites the student's initial traces to stabilize intrinsic exploration. The model then internalizes these behaviors via Supervised Fine-Tuning (SFT), followed by Group Relative Policy Optimization (GRPO) to enhance reasoning robustness.
  • Figure 4: Structural stability analysis via Overthinking ($\xi_{\text{OT}}$) vs. Underthinking ($\xi_{\text{UT}}$) on MuSiQue. While baseline methods exhibit instability by drifting towards premature termination or excessive continuation, MBT-S and MBT-R successfully converge in the stable lower-left region. This confirms that MBT effectively regulates the reasoning process to prevent both extremes.
  • Figure 5: Comparison of Metacognition Scores on the MuSiQue dataset using Qwen3-4B. We evaluate the richness of metacognitive behaviors across various methods. The detailed evaluation prompt and scoring rubric are provided in the Figure \ref{['fig:figA6_metacognition_score_prompt']}.
  • ...and 11 more figures