Table of Contents
Fetching ...

Designing for Appropriate Reliance: The Roles of AI Uncertainty Presentation, Initial User Decision, and User Demographics in AI-Assisted Decision-Making

Shiye Cao, Anqi Liu, Chien-Ming Huang

TL;DR

This paper investigates how presenting AI uncertainty, a user's initial decision, and demographic factors shape appropriate reliance in AI-assisted decision-making. Using an online skin cancer screening task, it compares Baseline, Raw Probability, Calibrated Probability, and Calibrated Frequency uncertainty presentations, with calibration performed via beta calibration. Key findings show calibrated frequency representations improve users' ability to adjust reliance based on AI uncertainty and reduce confirmation bias, while calibration alone offers limited benefits and over-reliance persists overall. The results suggest a path toward personalized AI aids that tailor uncertainty presentation, initial decision context, and user demographics to optimize human-AI collaboration in critical decisions.

Abstract

Appropriate reliance is critical to achieving synergistic human-AI collaboration. For instance, when users over-rely on AI assistance, their human-AI team performance is bounded by the model's capability. This work studies how the presentation of model uncertainty may steer users' decision-making toward fostering appropriate reliance. Our results demonstrate that showing the calibrated model uncertainty alone is inadequate. Rather, calibrating model uncertainty and presenting it in a frequency format allow users to adjust their reliance accordingly and help reduce the effect of confirmation bias on their decisions. Furthermore, the critical nature of our skin cancer screening task skews participants' judgment, causing their reliance to vary depending on their initial decision. Additionally, step-wise multiple regression analyses revealed how user demographics such as age and familiarity with probability and statistics influence human-AI collaborative decision-making. We discuss the potential for model uncertainty presentation, initial user decision, and user demographics to be incorporated in designing personalized AI aids for appropriate reliance.

Designing for Appropriate Reliance: The Roles of AI Uncertainty Presentation, Initial User Decision, and User Demographics in AI-Assisted Decision-Making

TL;DR

This paper investigates how presenting AI uncertainty, a user's initial decision, and demographic factors shape appropriate reliance in AI-assisted decision-making. Using an online skin cancer screening task, it compares Baseline, Raw Probability, Calibrated Probability, and Calibrated Frequency uncertainty presentations, with calibration performed via beta calibration. Key findings show calibrated frequency representations improve users' ability to adjust reliance based on AI uncertainty and reduce confirmation bias, while calibration alone offers limited benefits and over-reliance persists overall. The results suggest a path toward personalized AI aids that tailor uncertainty presentation, initial decision context, and user demographics to optimize human-AI collaboration in critical decisions.

Abstract

Appropriate reliance is critical to achieving synergistic human-AI collaboration. For instance, when users over-rely on AI assistance, their human-AI team performance is bounded by the model's capability. This work studies how the presentation of model uncertainty may steer users' decision-making toward fostering appropriate reliance. Our results demonstrate that showing the calibrated model uncertainty alone is inadequate. Rather, calibrating model uncertainty and presenting it in a frequency format allow users to adjust their reliance accordingly and help reduce the effect of confirmation bias on their decisions. Furthermore, the critical nature of our skin cancer screening task skews participants' judgment, causing their reliance to vary depending on their initial decision. Additionally, step-wise multiple regression analyses revealed how user demographics such as age and familiarity with probability and statistics influence human-AI collaborative decision-making. We discuss the potential for model uncertainty presentation, initial user decision, and user demographics to be incorporated in designing personalized AI aids for appropriate reliance.
Paper Structure (48 sections, 1 equation, 9 figures, 5 tables)

This paper contains 48 sections, 1 equation, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Reliability diagram of trained model showing a visual representation of model calibration. The diagram plots the expected sample accuracy as a function of model confidence for the cancer class on a held-out test set.
  • Figure 2: Overview of the four different model uncertainty presentations explored in the study.
  • Figure 3: Overview of the participants' decision-making process. Participants first provide an initial response. Then, the AI prediction is revealed to the participants. Regardless of whether the AI suggestion matched the participant's initial response, participants are asked to provide a final response on behalf of the human-AI team. The bottom of this figure shows the number/percentage of cases that belongs to each branch in our study. Then, based on the correctness of the initial response, AI suggestion, and final response, we specify whether the user choice made in that branch should be considered appropriate reliance.
  • Figure 4: Example test cases from the experiment. Participants and the AI agent appeared to have complementary expertise; The AI made the incorrect prediction on a test case that most participants correctly identified as cancer, but the AI also made the correct prediction on two test cases that most participants failed to correctly classify initially.
  • Figure 5: (a) Distribution of initial user decision. Participants were more likely to think that a case is cancer than benign, even though cancer is the less likely event. (b) Distribution of initial user decision by the correctness of the decision. Participants are much more likely to make a Type 1 error than a Type 2 error in their initial response. (c) Distribution of switch to AI by initial human-AI match. Among cases in which the AI suggestion matched the user's initial response, participants almost never switched to disagree with the AI such that their final response almost always still agreed with the AI suggestion. (d) Distribution of user confidence change among cases in which (green) the AI suggestion matched the user's initial response, and the user did not switch their response to disagree with the AI suggestion (see Figure \ref{['fig:flowchart']} branches b, h); (blue) the AI suggestion mismatched the user's initial response, and switched their response to agree with the AI (see Figure \ref{['fig:flowchart']} branches c, e); (red) the AI suggestion mismatched the user's initial response, and the user did not switch their response to agree with the AI suggestion (see Figure \ref{['fig:flowchart']} d, f). Participants increased their confidence when the AI suggestion matched their initial response and decreased their confidence when the AI suggestion mismatched their initial response and they did not switch their response to agree with the AI. A smaller decrease in confidence was observed in cases in which the AI suggestion mismatched the user's initial response and the user switched their response to agree with the AI than cases in which the AI suggestion mismatched the user's initial response and the user did not switch their response to agree with the AI suggestion.
  • ...and 4 more figures