Table of Contents
Fetching ...

Conformal Prediction Sets Improve Human Decision Making

Jesse C. Cresswell, Yi Sui, Bhargava Kumar, Noël Vouitsis

TL;DR

This paper demonstrates that conformal prediction sets provide calibrated, uncertainty-aware outputs that improve human decision making across diverse tasks. By conducting a preregistered randomized trial with control, top-$k$, and conformal treatments, it shows that conformal sets yield higher human accuracy than fixed-size alternatives while maintaining the same coverage. The work highlights that smaller average set sizes and explicit uncertainty signaling drive the gains, with adoption rates closely matching the reported coverage. Practically, the findings support integrating conformal prediction into human-in-the-loop systems to enhance decision quality, while noting variability across tasks and the need for careful consideration of speed and fairness implications.

Abstract

In response to everyday queries, humans explicitly signal uncertainty and offer alternative answers when they are unsure. Machine learning models that output calibrated prediction sets through conformal prediction mimic this human behaviour; larger sets signal greater uncertainty while providing alternatives. In this work, we study the usefulness of conformal prediction sets as an aid for human decision making by conducting a pre-registered randomized controlled trial with conformal prediction sets provided to human subjects. With statistical significance, we find that when humans are given conformal prediction sets their accuracy on tasks improves compared to fixed-size prediction sets with the same coverage guarantee. The results show that quantifying model uncertainty with conformal prediction is helpful for human-in-the-loop decision making and human-AI teams.

Conformal Prediction Sets Improve Human Decision Making

TL;DR

This paper demonstrates that conformal prediction sets provide calibrated, uncertainty-aware outputs that improve human decision making across diverse tasks. By conducting a preregistered randomized trial with control, top-, and conformal treatments, it shows that conformal sets yield higher human accuracy than fixed-size alternatives while maintaining the same coverage. The work highlights that smaller average set sizes and explicit uncertainty signaling drive the gains, with adoption rates closely matching the reported coverage. Practically, the findings support integrating conformal prediction into human-in-the-loop systems to enhance decision quality, while noting variability across tasks and the need for careful consideration of speed and fairness implications.

Abstract

In response to everyday queries, humans explicitly signal uncertainty and offer alternative answers when they are unsure. Machine learning models that output calibrated prediction sets through conformal prediction mimic this human behaviour; larger sets signal greater uncertainty while providing alternatives. In this work, we study the usefulness of conformal prediction sets as an aid for human decision making by conducting a pre-registered randomized controlled trial with conformal prediction sets provided to human subjects. With statistical significance, we find that when humans are given conformal prediction sets their accuracy on tasks improves compared to fixed-size prediction sets with the same coverage guarantee. The results show that quantifying model uncertainty with conformal prediction is helpful for human-in-the-loop decision making and human-AI teams.
Paper Structure (34 sections, 5 equations, 18 figures, 7 tables)

This paper contains 34 sections, 5 equations, 18 figures, 7 tables.

Figures (18)

  • Figure 1: Top: Humans express uncertainty through explicit signalling, and by offering alternatives. Bottom: Conformal prediction allows machine learning models to do the same by outputting prediction sets with size calibrated to model uncertainty. Larger sets signal greater uncertainty and provide alternative answers.
  • Figure 2: Main trial screen shown to participants for ObjectNet with conformal set treatment. The correct answer is given only after the participant responds.
  • Figure 3: Human performance (accuracy) across three tasks and three treatments. Data is shown as mean accuracy, while error bars show unbiased standard errors ($N=50$).
  • Figure 4: Human performance (speed) across three tasks and three treatments. Data is shown as mean response time (s), while error bars show unbiased standard errors ($N=50$).
  • Figure 5: Accuracy by prediction set size on GoEmotions.
  • ...and 13 more figures