Agentic Uncertainty Reveals Agentic Overconfidence

Jean Kaddour; Srijan Patel; Gbètondji Dovonon; Leo Richter; Pasquale Minervini; Matt J. Kusner

Agentic Uncertainty Reveals Agentic Overconfidence

Jean Kaddour, Srijan Patel, Gbètondji Dovonon, Leo Richter, Pasquale Minervini, Matt J. Kusner

TL;DR

The paper investigates agentic uncertainty, asking whether AI agents can reliably predict their own multi-step task success. It formalizes the problem with $P(IS)=P( ext{agent}_{M} ext{ succeeds on } t ig| \mathcal{I})$ and compares pre-, mid-, and post-execution uncertainty agents, plus an adversarial post-execution variant, across 100 SWE-bench Pro tasks and three frontier models. Across regimes, agents exhibit pervasive overconfidence and calibration challenges, though adversarial framing and pre-execution estimates show improvements in calibration and discrimination, respectively. The findings emphasize the limits of self-assessment for autonomous coding workflows and advocate for hybrid deployment strategies that combine diverse uncertainty signals with human oversight to ensure safe and reliable decision-making. Overall, agentic self-assessment remains a critical safety challenge as AI systems scale to longer, more complex, and higher-stakes tasks.

Abstract

Can AI agents predict whether they will succeed at a task? We study agentic uncertainty by eliciting success probability estimates before, during, and after task execution. All results exhibit agentic overconfidence: some agents that succeed only 22% of the time predict 77% success. Counterintuitively, pre-execution assessment with strictly less information tends to yield better discrimination than standard post-execution review, though differences are not always significant. Adversarial prompting reframing assessment as bug-finding achieves the best calibration.

Agentic Uncertainty Reveals Agentic Overconfidence

TL;DR

The paper investigates agentic uncertainty, asking whether AI agents can reliably predict their own multi-step task success. It formalizes the problem with

and compares pre-, mid-, and post-execution uncertainty agents, plus an adversarial post-execution variant, across 100 SWE-bench Pro tasks and three frontier models. Across regimes, agents exhibit pervasive overconfidence and calibration challenges, though adversarial framing and pre-execution estimates show improvements in calibration and discrimination, respectively. The findings emphasize the limits of self-assessment for autonomous coding workflows and advocate for hybrid deployment strategies that combine diverse uncertainty signals with human oversight to ensure safe and reliable decision-making. Overall, agentic self-assessment remains a critical safety challenge as AI systems scale to longer, more complex, and higher-stakes tasks.

Abstract

Paper Structure (33 sections, 1 equation, 9 figures, 3 tables)

This paper contains 33 sections, 1 equation, 9 figures, 3 tables.

Introduction
Methods
Problem Setup
Uncertainty Agents
Pre-Execution Agent
Mid-Execution Agent
Post-Execution Agent
Adversarial post-execution variant.
Experiments
Setup
Pervasive Overconfidence
Less Information, Better Discrimination
Mid-Execution: Uninformative Doubt
Adversarial Framing Reduces Overconfidence
Shift vs. signal decomposition.
...and 18 more sections

Figures (9)

Figure 1: Agentic overconfidence. We measure the overconfidence as the difference between the estimated success probability and the true success probability (true rates: GPT-5.2 Codex 35%, Gemini-3-Pro 22%, Opus 4.5 27%). We plot three strategies: pre-, post-, and adversarial-post-execution. All agents systematically overestimate their success.
Figure 2: Agentic Uncertainty Regimes. Each regime observes different information. Post-execution and adversarial post-execution occur at the same point but use different prompts.
Figure 3: Uncertainty Agent Prompt Excerpts.Pre-execution explores the codebase before any solution attempt. Mid-execution evaluates an agent's partial trajectory for signs of progress or struggle. Post-execution reviews a proposed patch. Adversarial post-execution explicitly prompts bug-finding before estimation. All agents output probability estimates $[0,100]$.
Figure 4: Distribution of post-execution confidence estimates by model. Success cases shown above the axis (green), failure cases below (red); dashed lines indicate base rates. Mirror symmetry reveals indistinguishable distributions: where bars match above and below, the model assigns identical confidence regardless of outcome. Gemini exhibits the most extreme pattern: nearly all predictions cluster at 100% confidence, creating dramatic mirrored towers. This visual symmetry directly explains the poor discrimination: high-confidence predictions provide no signal about actual success.
Figure 5: Calibration curves reveal systematic overconfidence. Points below the diagonal (shaded region) indicate overconfidence: models predict higher success probability than achieved. All methods fall in this region across all models. Gemini shows the most severe miscalibration: predictions near 100% yield only $\sim$20% accuracy. The adversarial method (triangles) consistently shifts curves upward toward the diagonal, achieving the best calibration, while pre-execution (circles) shows less extreme overconfidence than standard post-execution (squares) for GPT and Claude.
...and 4 more figures

Agentic Uncertainty Reveals Agentic Overconfidence

TL;DR

Abstract

Agentic Uncertainty Reveals Agentic Overconfidence

Authors

TL;DR

Abstract

Table of Contents

Figures (9)