Table of Contents
Fetching ...

Learning to Trust: How Humans Mentally Recalibrate AI Confidence Signals

ZhaoBin Li, Mark Steyvers

Abstract

Productive human-AI collaboration requires appropriate reliance, yet contemporary AI systems are often miscalibrated, exhibiting systematic overconfidence or underconfidence. We investigate whether humans can learn to mentally recalibrate AI confidence signals through repeated experience. In a behavioral experiment (N = 200), participants predicted the AI's correctness across four AI calibration conditions: standard, overconfidence, underconfidence, and a counterintuitive "reverse confidence" mapping. Results demonstrate robust learning across all conditions, with participants significantly improving their accuracy, discrimination, and calibration alignment over 50 trials. We present a computational model utilizing a linear-in-log-odds (LLO) transformation and a Rescorla-Wagner learning rule to explain these dynamics. The model reveals that humans adapt by updating their baseline trust and confidence sensitivity, using asymmetric learning rates to prioritize the most informative errors. While humans can compensate for monotonic miscalibration, we identify a significant boundary in the reverse confidence scenario, where a substantial proportion of participants struggled to override initial inductive biases. These findings provide a mechanistic account of how humans adapt their trust in AI confidence signals through experience.

Learning to Trust: How Humans Mentally Recalibrate AI Confidence Signals

Abstract

Productive human-AI collaboration requires appropriate reliance, yet contemporary AI systems are often miscalibrated, exhibiting systematic overconfidence or underconfidence. We investigate whether humans can learn to mentally recalibrate AI confidence signals through repeated experience. In a behavioral experiment (N = 200), participants predicted the AI's correctness across four AI calibration conditions: standard, overconfidence, underconfidence, and a counterintuitive "reverse confidence" mapping. Results demonstrate robust learning across all conditions, with participants significantly improving their accuracy, discrimination, and calibration alignment over 50 trials. We present a computational model utilizing a linear-in-log-odds (LLO) transformation and a Rescorla-Wagner learning rule to explain these dynamics. The model reveals that humans adapt by updating their baseline trust and confidence sensitivity, using asymmetric learning rates to prioritize the most informative errors. While humans can compensate for monotonic miscalibration, we identify a significant boundary in the reverse confidence scenario, where a substantial proportion of participants struggled to override initial inductive biases. These findings provide a mechanistic account of how humans adapt their trust in AI confidence signals through experience.
Paper Structure (23 sections, 3 equations, 7 figures)

This paper contains 23 sections, 3 equations, 7 figures.

Figures (7)

  • Figure 1: Experimental trial structure: participants viewed a 1-second colored dots animation, and then received the AI's prediction on which color has the most dots and confidence score rounded to the nearest 10%. Participants then judge whether the AI was correct or wrong, and receive immediate feedback on their accuracy.
  • Figure 2: Probability densities of AI confidence distributions across the four conditions for correct and wrong decisions. The probabilities are rounded to the nearest 10% as shown to the experimental interface and normalized across both distributions.
  • Figure 3: Accuracy improvement across trial blocks for human participants and cognitive model in all four AI calibration conditions. Error bars represent 95% confidence intervals and the dashed line indicates 50% chance accuracy.
  • Figure 4: Hit rate increases and false alarm rate decreases across trial blocks for human participants and cognitive model in all four AI calibration conditions. Error bars represent 95% confidence intervals and the dashed line indicates 50% chance level.
  • Figure 5: Calibration curves showing AI accuracy (orange) and human perceived AI accuracy (blue dashed) across conditions. Left panel shows prior calibration at trial 1 from experiment-wide priors. Subsequent panels show posterior calibration at trial 50 from participant-level posteriors per experimental condition. Black diagonal dashed line shows perfect calibration. Shaded regions show 95% credible intervals.
  • ...and 2 more figures