Table of Contents
Fetching ...

Towards Uncertainty Aware Task Delegation and Human-AI Collaborative Decision-Making

Min Hun Lee, Martyn Zhe Yu Tok

TL;DR

This work addresses how uncertainty information should be communicated to support human-AI collaboration in high-stakes decision-making. By comparing distance-based uncertainty visualizations against traditional probability-based scores and enabling interactive threshold exploration, the authors demonstrate improved decision accuracy and reduced overreliance in a stroke rehabilitation assessment task. The study combines a neural-network-based decision-support system with embedding-based explanations and SHAP-driven features, validated in a mixed cohort of domain experts and novices. The findings suggest distance-based uncertainty representations with interactive, example-based explanations can meaningfully enhance analytical engagement and trustworthy AI-assisted decision-making in healthcare, while highlighting ongoing challenges in participant education and interface design.

Abstract

Despite the growing promise of artificial intelligence (AI) in supporting decision-making across domains, fostering appropriate human reliance on AI remains a critical challenge. In this paper, we investigate the utility of exploring distance-based uncertainty scores for task delegation to AI and describe how these scores can be visualized through embedding representations for human-AI decision-making. After developing an AI-based system for physical stroke rehabilitation assessment, we conducted a study with 19 health professionals and 10 students in medicine/health to understand the effect of exploring distance-based uncertainty scores on users' reliance on AI. Our findings showed that distance-based uncertainty scores outperformed traditional probability-based uncertainty scores in identifying uncertain cases. In addition, after exploring confidence scores for task delegation and reviewing embedding-based visualizations of distance-based uncertainty scores, participants achieved an 8.20% higher rate of correct decisions, a 7.15% higher rate of changing their decisions to correct ones, and a 7.14% lower rate of incorrect changes after reviewing AI outputs than those reviewing probability-based uncertainty scores ($p<0.01$). Our findings highlight the potential of distance-based uncertainty scores to enhance decision accuracy and appropriate reliance on AI while discussing ongoing challenges for human-AI collaborative decision-making.

Towards Uncertainty Aware Task Delegation and Human-AI Collaborative Decision-Making

TL;DR

This work addresses how uncertainty information should be communicated to support human-AI collaboration in high-stakes decision-making. By comparing distance-based uncertainty visualizations against traditional probability-based scores and enabling interactive threshold exploration, the authors demonstrate improved decision accuracy and reduced overreliance in a stroke rehabilitation assessment task. The study combines a neural-network-based decision-support system with embedding-based explanations and SHAP-driven features, validated in a mixed cohort of domain experts and novices. The findings suggest distance-based uncertainty representations with interactive, example-based explanations can meaningfully enhance analytical engagement and trustworthy AI-assisted decision-making in healthcare, while highlighting ongoing challenges in participant education and interface design.

Abstract

Despite the growing promise of artificial intelligence (AI) in supporting decision-making across domains, fostering appropriate human reliance on AI remains a critical challenge. In this paper, we investigate the utility of exploring distance-based uncertainty scores for task delegation to AI and describe how these scores can be visualized through embedding representations for human-AI decision-making. After developing an AI-based system for physical stroke rehabilitation assessment, we conducted a study with 19 health professionals and 10 students in medicine/health to understand the effect of exploring distance-based uncertainty scores on users' reliance on AI. Our findings showed that distance-based uncertainty scores outperformed traditional probability-based uncertainty scores in identifying uncertain cases. In addition, after exploring confidence scores for task delegation and reviewing embedding-based visualizations of distance-based uncertainty scores, participants achieved an 8.20% higher rate of correct decisions, a 7.15% higher rate of changing their decisions to correct ones, and a 7.14% lower rate of incorrect changes after reviewing AI outputs than those reviewing probability-based uncertainty scores (). Our findings highlight the potential of distance-based uncertainty scores to enhance decision accuracy and appropriate reliance on AI while discussing ongoing challenges for human-AI collaborative decision-making.

Paper Structure

This paper contains 44 sections, 3 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: (a) Users can explore different thresholds of confidence scores, review AI performance on delegated cases from a held-out dataset, and specify a confidence threshold to delegate cases to AI (e.g. delegating cases with confidence scores above 60% to AI). (b) Task Delegation: Users can review AI confidence scores on assigned cases and identify which ones require human review. 2x2 Experimental Conditions: Participants with/without exploring a threshold interacted with two interfaces with numerical and distance-based confidence scores.
  • Figure 2: Interface for AI-assisted decision-making. For each case, the system shows the video of a patient along with an AI predicted score on patient's quality of motion, confidence scores, example-based explanations (i.e. relevant images and information), and important feature explanations. For confidence scores, our work investigates comparing a distance-based visualization of confidence scores with a numerical presentation of confidence scores. (1) Numerical - representing the highest class probability output by the model and (2) Distance-based confidence scores computed by measuring the distance between the input and the centroid of each class (among $K$ classes), normalizing it by (1 - $\frac{d}{d_{max}}$) and applying a softmax function.
  • Figure 3: Performance of AI-assisted decision-making on rehabilitation assessment tasks by all participants (All), participants without exploring a threshold of confidence scores (NoExp), participants with exploring a threshold of confidence scores (Exp), domain experts, therapists (TPs), and novices (NVs) (e.g. health professionals and students).