Table of Contents
Fetching ...

Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens

Ziyang Ma, Qingyue Yuan, Zhenglin Wang, Deyu Zhou

TL;DR

This work addresses the reliability of large language models by examining their intrinsic meta-cognition, proposing Automated Meta-cognition Evaluation (AutoMeco) to benchmark meta-cognition lenses without human annotations and introducing a training-free Markovian Intrinsic Reward Adjustment (MIRA) to improve step-level signals. AutoMeco uses a Process Reward Model (PRM) as a judge to annotate step correctness and evaluate lenses such as entropy, perplexity, and probability-based measures across three mathematical reasoning datasets. The authors demonstrate that meta-cognition signals correlate with PRM judgments and that MIRA yields robust improvements in a majority of configurations across models and datasets, with only marginal latency overhead. The findings suggest a feasible path to more reliable LLM reasoning through automated meta-cognition benchmarking and intrinsic reward adjustment, with implications for self-improvement and safer deployment of reasoning systems. The work also discusses limitations related to model access, overhead, and extending the framework to larger reasoning models.

Abstract

Previous research has primarily focused on the cognitive error detection capabilities of Large Language Models (LLMs), often prompting them to analyze mistakes in reasoning chains. However, few studies have examined the meta-cognitive abilities of LLMs (e.g., their self-awareness of step errors), which are crucial for their reliability. While studies on LLM self-evaluation present some measures, such as perplexity, which can reflect the answer correctness and be viewed as the lens of meta-cognition, they lack step-level analysis and adaptation. This paper studies the evaluation of LLM meta-cognition using the current lenses and how to improve these lenses. Specifically, we propose AutoMeco, an Automated Meta-cognition Evaluation framework for benchmarking the existing lenses. Furthermore, a training-free Markovian Intrinsic Reward Adjustment strategy, MIRA, is proposed to boost current meta-cognition lenses. Experimental results on three mathematical reasoning datasets and three LLMs show the reasonableness of AutoMeco by comparing it with Best-of-N verification. Moreover, the meta-cognition ability of LLMs can be better evaluated using MIRA.

Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens

TL;DR

This work addresses the reliability of large language models by examining their intrinsic meta-cognition, proposing Automated Meta-cognition Evaluation (AutoMeco) to benchmark meta-cognition lenses without human annotations and introducing a training-free Markovian Intrinsic Reward Adjustment (MIRA) to improve step-level signals. AutoMeco uses a Process Reward Model (PRM) as a judge to annotate step correctness and evaluate lenses such as entropy, perplexity, and probability-based measures across three mathematical reasoning datasets. The authors demonstrate that meta-cognition signals correlate with PRM judgments and that MIRA yields robust improvements in a majority of configurations across models and datasets, with only marginal latency overhead. The findings suggest a feasible path to more reliable LLM reasoning through automated meta-cognition benchmarking and intrinsic reward adjustment, with implications for self-improvement and safer deployment of reasoning systems. The work also discusses limitations related to model access, overhead, and extending the framework to larger reasoning models.

Abstract

Previous research has primarily focused on the cognitive error detection capabilities of Large Language Models (LLMs), often prompting them to analyze mistakes in reasoning chains. However, few studies have examined the meta-cognitive abilities of LLMs (e.g., their self-awareness of step errors), which are crucial for their reliability. While studies on LLM self-evaluation present some measures, such as perplexity, which can reflect the answer correctness and be viewed as the lens of meta-cognition, they lack step-level analysis and adaptation. This paper studies the evaluation of LLM meta-cognition using the current lenses and how to improve these lenses. Specifically, we propose AutoMeco, an Automated Meta-cognition Evaluation framework for benchmarking the existing lenses. Furthermore, a training-free Markovian Intrinsic Reward Adjustment strategy, MIRA, is proposed to boost current meta-cognition lenses. Experimental results on three mathematical reasoning datasets and three LLMs show the reasonableness of AutoMeco by comparing it with Best-of-N verification. Moreover, the meta-cognition ability of LLMs can be better evaluated using MIRA.

Paper Structure

This paper contains 46 sections, 25 equations, 6 figures, 8 tables, 2 algorithms.

Figures (6)

  • Figure 1: In reasoning tasks, error detection (a) focuses on LLMs' cognitive ability to analyze errors in reasoning steps. Self-evaluation (b) utilizes measures such as entropy as lenses to reflect self-awareness of answer rightness. Our work (c) studies the evaluation and improvement of the current lenses in reflecting LLM meta-cognition. Bold "correct" and "wrong" within boxes are ground truths of the answer or step correctness.
  • Figure 2: Intrinsic feature distributions of correct and incorrect steps of Qwen2.5-7B on GSM8K, MATH500, and MinervaMATH. Green and red contours represent features of correct and wrong steps.
  • Figure 3: Frequency of MIRA leading to improved, unchanged, and degraded performance for meta-cognition lenses on three LLMs (Qwen2.5-7B, Llama-3-8B-Instruct, Mistral-7B-Instruct) and three datasets (GSM8K, MATH500, MinervaMATH).
  • Figure 4: A demonstration of MIRA enhancing meta-cognition observation for Qwen2.5-7B on GSM8K. The thresholds are determined by maximizing the F1 score of step correctness prediction.
  • Figure 5: A demonstration of MIRA enhancing meta-cognition observation for Qwen2.5-7B on MATH500. The thresholds are determined by maximizing the F1 score of step correctness prediction.
  • ...and 1 more figures