"Are You Really Sure?" Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making

Shuai Ma; Xinru Wang; Ying Lei; Chuhan Shi; Ming Yin; Xiaojuan Ma

"Are You Really Sure?" Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making

Shuai Ma, Xinru Wang, Ying Lei, Chuhan Shi, Ming Yin, Xiaojuan Ma

TL;DR

The paper investigates how calibrating human self-confidence affects rationality and performance in AI-assisted decision-making. It introduces a Confidence-Correctness Matching framework combining human and AI confidence signals to diagnose inappropriate reliance and evaluates three calibration mechanisms (Think Opposite, Thinking in Bets, and Calibration Status Feedback) across multiple income-prediction studies. Findings show that human self-confidence calibration can improve task performance and reliance appropriateness in many settings, but benefits can hinge on AI confidence alignment; misalignment can produce adverse effects. The work offers design recommendations and a roadmap for integrating human confidence calibration into future AI-assisted interfaces, highlighting both practical benefits and limitations. Overall, it provides a nuanced, multi-study understanding of how calibrated human confidence shapes collaboration with probabilistic AI systems.

Abstract

In AI-assisted decision-making, it is crucial but challenging for humans to achieve appropriate reliance on AI. This paper approaches this problem from a human-centered perspective, "human self-confidence calibration". We begin by proposing an analytical framework to highlight the importance of calibrated human self-confidence. In our first study, we explore the relationship between human self-confidence appropriateness and reliance appropriateness. Then in our second study, We propose three calibration mechanisms and compare their effects on humans' self-confidence and user experience. Subsequently, our third study investigates the effects of self-confidence calibration on AI-assisted decision-making. Results show that calibrating human self-confidence enhances human-AI team performance and encourages more rational reliance on AI (in some aspects) compared to uncalibrated baselines. Finally, we discuss our main findings and provide implications for designing future AI-assisted decision-making interfaces.

"Are You Really Sure?" Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making

TL;DR

Abstract

Paper Structure (71 sections, 11 equations, 13 figures, 2 tables)

This paper contains 71 sections, 11 equations, 13 figures, 2 tables.

Introduction
Related work
Appropriate Reliance in AI-Assisted Decision-Making and Its Measurements
Enhancing Appropriate Reliance in AI-Assisted Decision-Making
Human Self-confidence in Decision Making and the Calibration
Unpacking Inappropriate Reliance from a Human Self-Confidence Perspective
Appropriateness of Human Self-Confidence
Existing Measurements of Confidence Appropriateness at A Task Level
Measuring Confidence Appropriateness at An Instance Level
An Analytical Framework Integrating Human and AI Confidence Appropriateness
How can we use the proposed analytical framework?
Study 1 - Understanding the Relationship between Human Self-Confidence Appropriateness and Reliance Appropriateness
Research Questions
Task and AI Model
Task
...and 56 more sections

Figures (13)

Figure 1: Reliability diagrams for a binary classification task guo2017calibration, illustrating calibrated confidence (left, the actual accuracy aligns with the stated confidence), over-confidence (middle, the actual accuracy falls below the stated confidence), and under-confidence (right, the actual accuracy is above the stated confidence).
Figure 2: A space of different combinations of 1) initial human prediction correctness and confidence, 2) AI suggestion correctness and its confidence, and 3) human final decision correctness, at a task instance level. To save space, we only highlight situations where a human's initial prediction differs from the AI's suggestion and the human's final decision is incorrect. Comparing (a) and (b), (a) may induce more incorrect AI reliance due to Human C-C Mismatched (Low&Correct). Similarly, (c) may lead to more incorrect self-reliance due to Human C-C Mismatched (High&Incorrect).
Figure 3: The interface and procedure for making a prediction on a task instance.
Figure 4: An analysis of error rate in different human and AI Confidence-Correctness Matching situations. The left shows the four categories considering both human and AI C-C Matching. The right shows the two categories only considering human C-C Matching no matter whether AI is C-C Matched or not. Error bars indicate standard errors. (*: $p$ < 0.05; **: $p$ < 0.01; ***: $p$ < 0.001)
Figure 5: Interfaces of different self-confidence calibration conditions. (A) Think the Opposite. (B) Thinking in Bets. (C) Calibration Status Feedback contains two views, (1) real-time feedback during the decision-making process and (2) post-hoc feedback after a batch of decision tasks.
...and 8 more figures

"Are You Really Sure?" Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making

TL;DR

Abstract

"Are You Really Sure?" Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making

Authors

TL;DR

Abstract

Table of Contents

Figures (13)