Table of Contents
Fetching ...

Probing the "Psyche'' of Large Reasoning Models: Understanding Through a Human Lens

Yuxiang Chen, Zuohan Wu, Ziwei Wang, Xiangning Yu, Xujia Li, Linyi Yang, Mengyue Yang, Jun Wang, Lei Chen

TL;DR

This work introduces a fine-grained CoT taxonomy for large reasoning architectures grounded in human cognitive theory, enabling atomic-step analysis of reasoning traces. It constructs a large labeled dataset of 277,534 atomic CoT steps and proposes CAPO, an auxiliary LLM-based annotation framework that uses GA-inspired prompt optimization to achieve high consistency with human annotations. The authors report four actionable insights—emphasizing information organization, hypothesis generation, reflections, and redundancy—and demonstrate CAPO’s effectiveness in scaling accurate annotations with near-human alignment. Overall, the study provides a principled, scalable approach to understanding and improving reasoning processes in large reasoning architectures through human-centric analysis and scalable annotation techniques.

Abstract

Large reasoning models (LRMs) have garnered significant attention from researchers owing to their exceptional capability in addressing complex tasks. Motivated by the observed human-like behaviors in their reasoning processes, this paper introduces a comprehensive taxonomy to characterize atomic reasoning steps and probe the ``psyche'' of LRM intelligence. Specifically, it comprises five groups and seventeen categories derived from human mental processes, thereby grounding the understanding of LRMs in an interdisciplinary perspective. The taxonomy is then applied for an in-depth understanding of current LRMs, resulting in a distinct labeled dataset that comprises 277,534 atomic reasoning steps. Using this resource, we analyze contemporary LRMs and distill several actionable takeaways for improving training and post-training of reasoning models. Notably, our analysis reveals that prevailing post-answer ``double-checks'' (self-monitoring evaluations) are largely superficial and rarely yield substantive revisions. Thus, incentivizing comprehensive multi-step reflection, rather than simple self-monitoring, may offer a more effective path forward. To complement the taxonomy, an automatic annotation framework, named CAPO, is proposed to leverage large language models (LLMs) for generating the taxonomy-based annotations. Experimental results demonstrate that CAPO achieves higher consistency with human experts compared to baselines, facilitating a scalable and comprehensive analysis of LRMs from a human cognitive perspective. Together, the taxonomy, CAPO, and the derived insights provide a principled, scalable path toward understanding and advancing LRM reasoning.

Probing the "Psyche'' of Large Reasoning Models: Understanding Through a Human Lens

TL;DR

This work introduces a fine-grained CoT taxonomy for large reasoning architectures grounded in human cognitive theory, enabling atomic-step analysis of reasoning traces. It constructs a large labeled dataset of 277,534 atomic CoT steps and proposes CAPO, an auxiliary LLM-based annotation framework that uses GA-inspired prompt optimization to achieve high consistency with human annotations. The authors report four actionable insights—emphasizing information organization, hypothesis generation, reflections, and redundancy—and demonstrate CAPO’s effectiveness in scaling accurate annotations with near-human alignment. Overall, the study provides a principled, scalable approach to understanding and improving reasoning processes in large reasoning architectures through human-centric analysis and scalable annotation techniques.

Abstract

Large reasoning models (LRMs) have garnered significant attention from researchers owing to their exceptional capability in addressing complex tasks. Motivated by the observed human-like behaviors in their reasoning processes, this paper introduces a comprehensive taxonomy to characterize atomic reasoning steps and probe the ``psyche'' of LRM intelligence. Specifically, it comprises five groups and seventeen categories derived from human mental processes, thereby grounding the understanding of LRMs in an interdisciplinary perspective. The taxonomy is then applied for an in-depth understanding of current LRMs, resulting in a distinct labeled dataset that comprises 277,534 atomic reasoning steps. Using this resource, we analyze contemporary LRMs and distill several actionable takeaways for improving training and post-training of reasoning models. Notably, our analysis reveals that prevailing post-answer ``double-checks'' (self-monitoring evaluations) are largely superficial and rarely yield substantive revisions. Thus, incentivizing comprehensive multi-step reflection, rather than simple self-monitoring, may offer a more effective path forward. To complement the taxonomy, an automatic annotation framework, named CAPO, is proposed to leverage large language models (LLMs) for generating the taxonomy-based annotations. Experimental results demonstrate that CAPO achieves higher consistency with human experts compared to baselines, facilitating a scalable and comprehensive analysis of LRMs from a human cognitive perspective. Together, the taxonomy, CAPO, and the derived insights provide a principled, scalable path toward understanding and advancing LRM reasoning.

Paper Structure

This paper contains 17 sections, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: A brief illustration of the proposed taxonomy from human perspectives.
  • Figure 2: Proportional differences where red dots emphasize that have significances (U test). The purple line indicates the proportion of each mental process in all samples.
  • Figure 3: Post-answer check statistics. An A$\rightarrow$B denotes the correctness before (A) and (B) after post-answer check, e.g., incorrectness is fixed with checking (I$\rightarrow$C).
  • Figure 4: PNS values of reasoning steps for Q1–Q10 before and after causal intervention.
  • Figure 5: The CAPO framework, where each prompt comprises separated constant, variable, and mutable areas.
  • ...and 1 more figures