Probing the Difficulty Perception Mechanism of Large Language Models
Sunbowen Lee, Qingyu Yin, Chak Tou Leong, Jialiang Zhang, Yicheng Gong, Shiwen Ni, Min Yang, Xiaoyu Shen
TL;DR
The paper addresses whether large language models internally perceive problem difficulty and how such perception can be exploited for adaptive reasoning. It introduces a high-dimensional linear probe on final-token embeddings to predict difficulty with $ ilde{y}=\mathbf{w}^{\top}\mathbf{h}+b$ and a head-pattern localization framework that identifies last-layer attention heads responsible for difficulty signals. Key findings show that difficulty lies in a high-dimensional linear direction and that specific attention heads exhibit distinct activation patterns for easy versus hard problems, with causal evidence from head-wise ablations. The work demonstrates potential for automatic, scalable difficulty annotation, enabling more efficient benchmark construction and curriculum learning, and provides theoretical insights into the organization of internal representations for reasoning difficulty in LLMs.
Abstract
Large language models (LLMs) are increasingly deployed on complex reasoning tasks, yet little is known about their ability to internally evaluate problem difficulty, which is an essential capability for adaptive reasoning and efficient resource allocation. In this work, we investigate whether LLMs implicitly encode problem difficulty in their internal representations. Using a linear probe on the final-token representations of LLMs, we demonstrate that the difficulty level of math problems can be linearly modeled. We further locate the specific attention heads of the final Transformer layer: these attention heads have opposite activation patterns for simple and difficult problems, thus achieving perception of difficulty. Our ablation experiments prove the accuracy of the location. Crucially, our experiments provide practical support for using LLMs as automatic difficulty annotators, potentially substantially reducing reliance on costly human labeling in benchmark construction and curriculum learning. We also uncover that there is a significant difference in entropy and difficulty perception at the token level. Our study reveals that difficulty perception in LLMs is not only present but also structurally organized, offering new theoretical insights and practical directions for future research. Our code is available at https://github.com/Aegis1863/Difficulty-Perception-of-LLMs.
