"Are You Really Sure?" Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making
Shuai Ma, Xinru Wang, Ying Lei, Chuhan Shi, Ming Yin, Xiaojuan Ma
TL;DR
The paper investigates how calibrating human self-confidence affects rationality and performance in AI-assisted decision-making. It introduces a Confidence-Correctness Matching framework combining human and AI confidence signals to diagnose inappropriate reliance and evaluates three calibration mechanisms (Think Opposite, Thinking in Bets, and Calibration Status Feedback) across multiple income-prediction studies. Findings show that human self-confidence calibration can improve task performance and reliance appropriateness in many settings, but benefits can hinge on AI confidence alignment; misalignment can produce adverse effects. The work offers design recommendations and a roadmap for integrating human confidence calibration into future AI-assisted interfaces, highlighting both practical benefits and limitations. Overall, it provides a nuanced, multi-study understanding of how calibrated human confidence shapes collaboration with probabilistic AI systems.
Abstract
In AI-assisted decision-making, it is crucial but challenging for humans to achieve appropriate reliance on AI. This paper approaches this problem from a human-centered perspective, "human self-confidence calibration". We begin by proposing an analytical framework to highlight the importance of calibrated human self-confidence. In our first study, we explore the relationship between human self-confidence appropriateness and reliance appropriateness. Then in our second study, We propose three calibration mechanisms and compare their effects on humans' self-confidence and user experience. Subsequently, our third study investigates the effects of self-confidence calibration on AI-assisted decision-making. Results show that calibrating human self-confidence enhances human-AI team performance and encourages more rational reliance on AI (in some aspects) compared to uncalibrated baselines. Finally, we discuss our main findings and provide implications for designing future AI-assisted decision-making interfaces.
