Table of Contents
Fetching ...

When combinations of humans and AI are useful: A systematic review and meta-analysis

Michelle Vaccaro, Abdullah Almaatouq, Thomas Malone

TL;DR

A systematic review and meta-analysis of the performance of human–AI combinations found that on average, human–AI combinations performed significantly worse than the best of humans or AI alone and also found performance losses in decision-making tasks and significantly greater gains in content creation tasks.

Abstract

Inspired by the increasing use of AI to augment humans, researchers have studied human-AI systems involving different tasks, systems, and populations. Despite such a large body of work, we lack a broad conceptual understanding of when combinations of humans and AI are better than either alone. Here, we addressed this question by conducting a meta-analysis of over 100 recent experimental studies reporting over 300 effect sizes. First, we found that, on average, human-AI combinations performed significantly worse than the best of humans or AI alone. Second, we found performance losses in tasks that involved making decisions and significantly greater gains in tasks that involved creating content. Finally, when humans outperformed AI alone, we found performance gains in the combination, but when the AI outperformed humans alone we found losses. These findings highlight the heterogeneity of the effects of human-AI collaboration and point to promising avenues for improving human-AI systems.

When combinations of humans and AI are useful: A systematic review and meta-analysis

TL;DR

A systematic review and meta-analysis of the performance of human–AI combinations found that on average, human–AI combinations performed significantly worse than the best of humans or AI alone and also found performance losses in decision-making tasks and significantly greater gains in content creation tasks.

Abstract

Inspired by the increasing use of AI to augment humans, researchers have studied human-AI systems involving different tasks, systems, and populations. Despite such a large body of work, we lack a broad conceptual understanding of when combinations of humans and AI are better than either alone. Here, we addressed this question by conducting a meta-analysis of over 100 recent experimental studies reporting over 300 effect sizes. First, we found that, on average, human-AI combinations performed significantly worse than the best of humans or AI alone. Second, we found performance losses in tasks that involved making decisions and significantly greater gains in tasks that involved creating content. Finally, when humans outperformed AI alone, we found performance gains in the combination, but when the AI outperformed humans alone we found losses. These findings highlight the heterogeneity of the effects of human-AI collaboration and point to promising avenues for improving human-AI systems.
Paper Structure (35 sections, 8 equations, 11 figures, 7 tables)

This paper contains 35 sections, 8 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Forest plots of all effect sizes ($k=370$) included in the meta-analysis. The positions of the points on the $x$-axes represent the values of the effect sizes, and the bars indicate the 95% confidence interval for the effect sizes. The colors of the points and lines correspond to the values of the effect sizes, with negative effect sizes colored red and positive effect sizes colored green. The black dotted line corresponds to an effect size of Hedges' $g = 0$, which means that the human-AI system performed the same as the baseline. The circle at the bottom of the graph represents the meta-analytic average effect size and confidence interval.
  • Figure 2: Results from the three-level meta-regression models for the moderator variables. Here, $N$ is the number of included effect sizes for the moderator subgroup level, and the estimated effect size with the corresponding 95% confidence interval. The symbols in front of the moderator indicate if there is a statistically significant difference between the subgroups for human-AI synergy (*) and human augmentation ($^{\wedge}$).
  • Figure S1: PRISMA flow diagram for the literature review and study inclusion process. $^*$Article retracted from journal. Adapted from page2021prisma.
  • Figure S2: Descriptive statistics for the effect sizes in our analysis.
  • Figure S3: Forest plots of all effect sizes ($n=370$) included in the meta-analysis for AI augmentation and negative synergy. The positions of the points on the $x$-axes represent the values of the effect sizes, and the bars indicate the 95% confidence interval for the effect sizes. The colors of the points and lines correspond to the values of the effect sizes, with negative effect sizes (no human-AI synergy) colored red and positive effect sizes colored green (human-AI synergy). The black dotted line corresponds to an effect size of Hedges' $g = 0$, which means that the human-AI group performed the same as the baseline. The circle at the bottom of the graph represents the meta-analytic average effect size and confidence interval.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Definition 1: Human-AI Synergy
  • Definition 2: Human Augmentation
  • Definition 3: AI Augmentation
  • Definition 4: Negative Synergy