Table of Contents
Fetching ...

Metacognition and Uncertainty Communication in Humans and Large Language Models

Mark Steyvers, Megan A. K. Peters

TL;DR

The paper addresses how metacognition—monitoring and evaluating one’s own knowledge—manifests in humans versus Large Language Models (LLMs) and why it matters for decision-making and collaboration. It surveys explicit and implicit uncertainty signals in LLMs, compares human and AI metacognitive architectures, and discusses how uncertainty is communicated in human–AI interactions. Key findings show that LLMs exhibit some metacognitive-like patterns yet differ in second-order representation, domain generality, and response behaviors, with training partially improving calibration and, to a lesser extent, sensitivity. The work highlights the potential to enhance human–AI collaboration and broader AI capabilities by developing more calibrated, self-directed metacognition and effective uncertainty communication.

Abstract

Metacognition--the capacity to monitor and evaluate one's own knowledge and performance--is foundational to human decision-making, learning, and communication. As large language models (LLMs) become increasingly embedded in both high-stakes and widespread low-stakes contexts, it is important to assess whether, how, and to what extent they exhibit metacognitive abilities. Here, we provide an overview of current knowledge of LLMs' metacognitive capacities, how they might be studied, and how they relate to our knowledge of metacognition in humans. We show that while humans and LLMs can sometimes appear quite aligned in their metacognitive capacities and behaviors, it is clear many differences remain; attending to these differences is important for enhancing human-AI collaboration. Finally, we discuss how endowing future LLMs with more sensitive and more calibrated metacognition may also help them develop new capacities such as more efficient learning, self-direction, and curiosity.

Metacognition and Uncertainty Communication in Humans and Large Language Models

TL;DR

The paper addresses how metacognition—monitoring and evaluating one’s own knowledge—manifests in humans versus Large Language Models (LLMs) and why it matters for decision-making and collaboration. It surveys explicit and implicit uncertainty signals in LLMs, compares human and AI metacognitive architectures, and discusses how uncertainty is communicated in human–AI interactions. Key findings show that LLMs exhibit some metacognitive-like patterns yet differ in second-order representation, domain generality, and response behaviors, with training partially improving calibration and, to a lesser extent, sensitivity. The work highlights the potential to enhance human–AI collaboration and broader AI capabilities by developing more calibrated, self-directed metacognition and effective uncertainty communication.

Abstract

Metacognition--the capacity to monitor and evaluate one's own knowledge and performance--is foundational to human decision-making, learning, and communication. As large language models (LLMs) become increasingly embedded in both high-stakes and widespread low-stakes contexts, it is important to assess whether, how, and to what extent they exhibit metacognitive abilities. Here, we provide an overview of current knowledge of LLMs' metacognitive capacities, how they might be studied, and how they relate to our knowledge of metacognition in humans. We show that while humans and LLMs can sometimes appear quite aligned in their metacognitive capacities and behaviors, it is clear many differences remain; attending to these differences is important for enhancing human-AI collaboration. Finally, we discuss how endowing future LLMs with more sensitive and more calibrated metacognition may also help them develop new capacities such as more efficient learning, self-direction, and curiosity.

Paper Structure

This paper contains 10 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: Demonstrations of confidence-accuracy relationships using cartoons and an empirical example based on results from GPT-3.5 steyvers2025large and a confidence finetuned GPT-4.1-mini model steyvers2025llmfinetuning, focusing on metacognitive sensitivity and calibration. Top row: Confidence distributions for correct (green) and incorrect (blue) answers allow assessment of metacognitive sensitivity. Illustrative results show examples of different degrees of separations between the distributions reflecting different degrees of metacognitive sensitivity. The empirical results using GPT-3.5 and GPT-4.1-mini show modest separation, with the area under the curve (AUC = 0.778 and AUC=0.83) reflecting the probability that a randomly selected correct answer is assigned higher confidence than a randomly selected incorrect answer. Bottom row: Metacognitive calibration can be seen by plotting accuracy as a function of confidence. The illustrative results show examples of over-confidence, under-confidence, and properly calibrated confidence (points directly along the diagonal). The GPT-3.5 results, based on implicit confidence signals from token likelihoods on multiple-choice questions (MMLU), show overconfidence---predicted confidence exceeds actual accuracy. In contrast, GPT-4.1-mini was finetuned to generate explicit verbal confidence estimates on short-answer trivia questions (TRIVIAQA), yielding improved calibration.