Table of Contents
Fetching ...

Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence

Harshavardhan

Abstract

We introduce Self-Anchoring Calibration Drift (SACD), a hypothesized tendency for large language models (LLMs) to show systematic changes in expressed confidence when building iteratively on their own prior outputs across multi-turn conversations. We report an empirical study comparing three frontier models -- Claude Sonnet 4.6, Gemini 3.1 Pro, and GPT-5.2 -- across 150 questions spanning factual, technical, and open-ended domains, using three conditions: single-turn baseline (A), multi-turn self-anchoring (B), and independent repetition control (C). Results reveal a complex, model-heterogeneous pattern that partially diverges from pre-registered hypotheses. Claude Sonnet 4.6 exhibited significant decreasing confidence under self-anchoring (mean CDS = -0.032, t(14) = -2.43, p = .029, d = -0.627), while also showing significant calibration error drift (F(4,56) = 22.77, p < .001, eta^2 = .791). GPT-5.2 showed the opposite pattern in open-ended domains (mean CDS = +0.026) with significant ECE escalation by Turn 5. Gemini 3.1 Pro showed no significant CDS (t(14) = 0.38, p = .710), but its Condition C data reveals a striking ECE pattern: without self-anchoring, Gemini's calibration error drops from .327 to near zero across repetitions, whereas self-anchoring holds ECE flat at approximately .333 -- indicating that SACD can manifest as suppression of natural calibration improvement rather than ac

Self-Anchoring Calibration Drift in Large Language Models: How Multi-Turn Conversations Reshape Model Confidence

Abstract

We introduce Self-Anchoring Calibration Drift (SACD), a hypothesized tendency for large language models (LLMs) to show systematic changes in expressed confidence when building iteratively on their own prior outputs across multi-turn conversations. We report an empirical study comparing three frontier models -- Claude Sonnet 4.6, Gemini 3.1 Pro, and GPT-5.2 -- across 150 questions spanning factual, technical, and open-ended domains, using three conditions: single-turn baseline (A), multi-turn self-anchoring (B), and independent repetition control (C). Results reveal a complex, model-heterogeneous pattern that partially diverges from pre-registered hypotheses. Claude Sonnet 4.6 exhibited significant decreasing confidence under self-anchoring (mean CDS = -0.032, t(14) = -2.43, p = .029, d = -0.627), while also showing significant calibration error drift (F(4,56) = 22.77, p < .001, eta^2 = .791). GPT-5.2 showed the opposite pattern in open-ended domains (mean CDS = +0.026) with significant ECE escalation by Turn 5. Gemini 3.1 Pro showed no significant CDS (t(14) = 0.38, p = .710), but its Condition C data reveals a striking ECE pattern: without self-anchoring, Gemini's calibration error drops from .327 to near zero across repetitions, whereas self-anchoring holds ECE flat at approximately .333 -- indicating that SACD can manifest as suppression of natural calibration improvement rather than ac
Paper Structure (32 sections, 2 equations, 5 figures, 4 tables)

This paper contains 32 sections, 2 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Mean expressed confidence across five turns under Condition B (Self-Anchoring) for each model, by domain. Dashed horizontal lines indicate single-turn baseline means from Condition A. Shaded bands show 95% CIs.
  • Figure 2: Expected Calibration Error (ECE) across turns by model and domain. Solid lines = Condition B (Self-Anchoring); dashed lines = Condition C (Independent Repetition). Note the Gemini 3.1 Pro panel: the dashed orange line (C) drops from .327 at Turn 1 to near zero by Turn 2, showing rapid natural calibration improvement — an improvement entirely absent in Condition B (solid), where ECE stays flat at $\approx .333$.
  • Figure 3: Reliability diagrams for Claude Sonnet 4.6 (Condition B), comparing Turn 1 (blue) and Turn 5 (pink) across domains. The dashed diagonal represents perfect calibration. Points above the diagonal indicate overconfidence.
  • Figure 4: Mean Confidence Drift Score (CDS $= T5 - T1$) for Conditions B and C by model. Error bars show 95% CIs. Positive values indicate confidence escalation; negative values indicate suppression.
  • Figure 5: Calibration Drift Score (CDS $= T5 - T1$) by model and domain (Condition B). Blue indicates confidence suppression; red indicates escalation. All models show CDS $\approx 0$ for factual questions.