Table of Contents
Fetching ...

Beyond AI advice -- independent aggregation boosts human-AI accuracy

Julian Berger, Pantelis P. Analytis, Ville Satopää, Ralf H. J. M. Kurvers

Abstract

Artificial intelligence (AI) is broadly deployed as an advisor to human decision-makers: AI recommends a decision and a human accepts or rejects the advice. This approach, however, has several limitations: People frequently ignore accurate advice and rely too much on inaccurate advice, and their decision-making skills may deteriorate over time. Here, we compare the AI-as-advisor approach to the hybrid confirmation tree (HCT), an alternative strategy that preserves the independence of human and AI judgments. The HCT elicits a human judgment and an AI judgment independently of each other. If they agree, that decision is accepted. If not, a second human breaks the tie. For the comparison, we used 10 datasets from various domains, including medical diagnostics and misinformation discernment, and a subset of four datasets in which AI also explained its decision. The HCT outperformed the AI-as-advisor approach in all datasets. The HCT also performed better in almost all cases in which AI offered an explanation of its judgment. Using signal detection theory to interpret these results, we find that the HCT outperforms the AI-as-advisor approach because people cannot discriminate well enough between correct and incorrect AI advice. Overall, the HCT is a robust, accurate, and transparent alternative to the AI-as-advisor approach, offering a simple mechanism to tap into the wisdom of hybrid crowds.

Beyond AI advice -- independent aggregation boosts human-AI accuracy

Abstract

Artificial intelligence (AI) is broadly deployed as an advisor to human decision-makers: AI recommends a decision and a human accepts or rejects the advice. This approach, however, has several limitations: People frequently ignore accurate advice and rely too much on inaccurate advice, and their decision-making skills may deteriorate over time. Here, we compare the AI-as-advisor approach to the hybrid confirmation tree (HCT), an alternative strategy that preserves the independence of human and AI judgments. The HCT elicits a human judgment and an AI judgment independently of each other. If they agree, that decision is accepted. If not, a second human breaks the tie. For the comparison, we used 10 datasets from various domains, including medical diagnostics and misinformation discernment, and a subset of four datasets in which AI also explained its decision. The HCT outperformed the AI-as-advisor approach in all datasets. The HCT also performed better in almost all cases in which AI offered an explanation of its judgment. Using signal detection theory to interpret these results, we find that the HCT outperforms the AI-as-advisor approach because people cannot discriminate well enough between correct and incorrect AI advice. Overall, the HCT is a robust, accurate, and transparent alternative to the AI-as-advisor approach, offering a simple mechanism to tap into the wisdom of hybrid crowds.

Paper Structure

This paper contains 23 sections, 4 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Workflow and performance of the hybrid confirmation tree (HCT) and the AI-as-advisor approach. (A) Workflow of the HCT and the AI-as-advisor approach with and without AI explanations (XAI-as-advisor). (B) Mean accuracy of the HCT, the AI-as-advisor approach, and humans without AI advice across datasets. (C) Mean accuracy of the HCT, the XAI-as-advisor approach, and humans without AI advice across datasets and experimental conditions of the XAI-as-advisor. Blue boxes show the accuracy improvement of the HCT over the AI-as-advisor. Results in (B) are ranked based on this value.
  • Figure 1: Performance comparison of the hybrid confirmation tree and decision making with AI advice per data set. The mean values of true positive and true negatives rates (x--axis) of the hybrid confirmation tree (blue), humans with AI advice (red), and humans without AI advice (orange) across data sets. Values in blue boxes show the accuracy improvement of the hybrid confirmation tree over humans with AI advice.
  • Figure 2: Performance comparison of the hybrid confirmation tree (HCT), the AI-as-advisor approach, and humans alone, as a function of human accuracy. (A) For the HCT, the results are averages given the first individual's performance level; tiebreakers could be low, mid, or high performers. (B) Accuracy of the HCT (dots) and the AI-as-advisor approach (red line) for different levels of human accuracy and different performance levels of the tiebreaker in the HCT. Results are model estimates across the five datasets that used a within-participant design. Point estimates are based on our model; error bars correspond to the 95$\%$ HDI. Numbers show accuracy values.
  • Figure 2: The accuracy of the hybrid confirmation tree (blue) and humans with AI advice (red) against the accuracy of the AI alone (x-axis). Positive (/negative) values indicate the method performs better (/worse) than the AI alone.
  • Figure 3: Performance comparison of the hybrid confirmation tree (HCT) and the AI-as-advisor approach, as a function of correct and incorrect AI decisions. (A) Accuracy of the HCT, the AI-as-advisor approach, and humans alone, for cases where the AI was correct and incorrect. (B) Human--AI agreement matrix for correct and incorrect choices (top), showing the performance of the HCT and the AI-as-advisor approach when there was human--AI disagreement and the AI was correct (bottom left) or incorrect (bottom right). Results are model estimates across all datasets; error bars correspond to the 95$\%$ HDI. Numbers show accuracy values.
  • ...and 7 more figures