Conformal Prediction Sets Improve Human Decision Making

Jesse C. Cresswell; Yi Sui; Bhargava Kumar; Noël Vouitsis

Conformal Prediction Sets Improve Human Decision Making

Jesse C. Cresswell, Yi Sui, Bhargava Kumar, Noël Vouitsis

TL;DR

This paper demonstrates that conformal prediction sets provide calibrated, uncertainty-aware outputs that improve human decision making across diverse tasks. By conducting a preregistered randomized trial with control, top-$k$, and conformal treatments, it shows that conformal sets yield higher human accuracy than fixed-size alternatives while maintaining the same coverage. The work highlights that smaller average set sizes and explicit uncertainty signaling drive the gains, with adoption rates closely matching the reported coverage. Practically, the findings support integrating conformal prediction into human-in-the-loop systems to enhance decision quality, while noting variability across tasks and the need for careful consideration of speed and fairness implications.

Abstract

In response to everyday queries, humans explicitly signal uncertainty and offer alternative answers when they are unsure. Machine learning models that output calibrated prediction sets through conformal prediction mimic this human behaviour; larger sets signal greater uncertainty while providing alternatives. In this work, we study the usefulness of conformal prediction sets as an aid for human decision making by conducting a pre-registered randomized controlled trial with conformal prediction sets provided to human subjects. With statistical significance, we find that when humans are given conformal prediction sets their accuracy on tasks improves compared to fixed-size prediction sets with the same coverage guarantee. The results show that quantifying model uncertainty with conformal prediction is helpful for human-in-the-loop decision making and human-AI teams.

Conformal Prediction Sets Improve Human Decision Making

TL;DR

, and conformal treatments, it shows that conformal sets yield higher human accuracy than fixed-size alternatives while maintaining the same coverage. The work highlights that smaller average set sizes and explicit uncertainty signaling drive the gains, with adoption rates closely matching the reported coverage. Practically, the findings support integrating conformal prediction into human-in-the-loop systems to enhance decision quality, while noting variability across tasks and the need for careful consideration of speed and fairness implications.

Abstract

Paper Structure (34 sections, 5 equations, 18 figures, 7 tables)

This paper contains 34 sections, 5 equations, 18 figures, 7 tables.

Introduction
Background and Related Work
Uncertainty Quantification
Prediction Sets and Coverage
Conformal Prediction
Human-in-the-Loop Conformal Prediction
Method
Experiments and Evaluation
Tasks, Datasets, and Models
Image Classification
Sentiment Analysis
Named Entity Recognition (NER)
Experiment Design
Results
Human Performance Measurement
...and 19 more sections

Figures (18)

Figure 1: Top: Humans express uncertainty through explicit signalling, and by offering alternatives. Bottom: Conformal prediction allows machine learning models to do the same by outputting prediction sets with size calibrated to model uncertainty. Larger sets signal greater uncertainty and provide alternative answers.
Figure 2: Main trial screen shown to participants for ObjectNet with conformal set treatment. The correct answer is given only after the participant responds.
Figure 3: Human performance (accuracy) across three tasks and three treatments. Data is shown as mean accuracy, while error bars show unbiased standard errors ($N=50$).
Figure 4: Human performance (speed) across three tasks and three treatments. Data is shown as mean response time (s), while error bars show unbiased standard errors ($N=50$).
Figure 5: Accuracy by prediction set size on GoEmotions.
...and 13 more figures

Conformal Prediction Sets Improve Human Decision Making

TL;DR

Abstract

Conformal Prediction Sets Improve Human Decision Making

Authors

TL;DR

Abstract

Table of Contents

Figures (18)