Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens
Ziyang Ma, Qingyue Yuan, Zhenglin Wang, Deyu Zhou
TL;DR
This work addresses the reliability of large language models by examining their intrinsic meta-cognition, proposing Automated Meta-cognition Evaluation (AutoMeco) to benchmark meta-cognition lenses without human annotations and introducing a training-free Markovian Intrinsic Reward Adjustment (MIRA) to improve step-level signals. AutoMeco uses a Process Reward Model (PRM) as a judge to annotate step correctness and evaluate lenses such as entropy, perplexity, and probability-based measures across three mathematical reasoning datasets. The authors demonstrate that meta-cognition signals correlate with PRM judgments and that MIRA yields robust improvements in a majority of configurations across models and datasets, with only marginal latency overhead. The findings suggest a feasible path to more reliable LLM reasoning through automated meta-cognition benchmarking and intrinsic reward adjustment, with implications for self-improvement and safer deployment of reasoning systems. The work also discusses limitations related to model access, overhead, and extending the framework to larger reasoning models.
Abstract
Previous research has primarily focused on the cognitive error detection capabilities of Large Language Models (LLMs), often prompting them to analyze mistakes in reasoning chains. However, few studies have examined the meta-cognitive abilities of LLMs (e.g., their self-awareness of step errors), which are crucial for their reliability. While studies on LLM self-evaluation present some measures, such as perplexity, which can reflect the answer correctness and be viewed as the lens of meta-cognition, they lack step-level analysis and adaptation. This paper studies the evaluation of LLM meta-cognition using the current lenses and how to improve these lenses. Specifically, we propose AutoMeco, an Automated Meta-cognition Evaluation framework for benchmarking the existing lenses. Furthermore, a training-free Markovian Intrinsic Reward Adjustment strategy, MIRA, is proposed to boost current meta-cognition lenses. Experimental results on three mathematical reasoning datasets and three LLMs show the reasonableness of AutoMeco by comparing it with Best-of-N verification. Moreover, the meta-cognition ability of LLMs can be better evaluated using MIRA.
