Table of Contents
Fetching ...

Towards Human-AI Complementarity with Prediction Sets

Giovanni De Toni, Nastaran Okati, Suhas Thejaswi, Eleni Straitouri, Manuel Gomez-Rodriguez

TL;DR

This paper introduces a simple and efficient greedy algorithm that, for a large class of expert models and non-conformity scores, is guaranteed to find prediction sets that provably offer equal or greater performance than those constructed using conformal prediction.

Abstract

Decision support systems based on prediction sets have proven to be effective at helping human experts solve classification tasks. Rather than providing single-label predictions, these systems provide sets of label predictions constructed using conformal prediction, namely prediction sets, and ask human experts to predict label values from these sets. In this paper, we first show that the prediction sets constructed using conformal prediction are, in general, suboptimal in terms of average accuracy. Then, we show that the problem of finding the optimal prediction sets under which the human experts achieve the highest average accuracy is NP-hard. More strongly, unless P = NP, we show that the problem is hard to approximate to any factor less than the size of the label set. However, we introduce a simple and efficient greedy algorithm that, for a large class of expert models and non-conformity scores, is guaranteed to find prediction sets that provably offer equal or greater performance than those constructed using conformal prediction. Further, using a simulation study with both synthetic and real expert predictions, we demonstrate that, in practice, our greedy algorithm finds near-optimal prediction sets offering greater performance than conformal prediction.

Towards Human-AI Complementarity with Prediction Sets

TL;DR

This paper introduces a simple and efficient greedy algorithm that, for a large class of expert models and non-conformity scores, is guaranteed to find prediction sets that provably offer equal or greater performance than those constructed using conformal prediction.

Abstract

Decision support systems based on prediction sets have proven to be effective at helping human experts solve classification tasks. Rather than providing single-label predictions, these systems provide sets of label predictions constructed using conformal prediction, namely prediction sets, and ask human experts to predict label values from these sets. In this paper, we first show that the prediction sets constructed using conformal prediction are, in general, suboptimal in terms of average accuracy. Then, we show that the problem of finding the optimal prediction sets under which the human experts achieve the highest average accuracy is NP-hard. More strongly, unless P = NP, we show that the problem is hard to approximate to any factor less than the size of the label set. However, we introduce a simple and efficient greedy algorithm that, for a large class of expert models and non-conformity scores, is guaranteed to find prediction sets that provably offer equal or greater performance than those constructed using conformal prediction. Further, using a simulation study with both synthetic and real expert predictions, we demonstrate that, in practice, our greedy algorithm finds near-optimal prediction sets offering greater performance than conformal prediction.
Paper Structure (19 sections, 6 theorems, 16 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 19 sections, 6 theorems, 16 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

The problem of finding the optimal prediction set, as defined in Eq. eq:goal, is $\mathsf{NP}$-hard.

Figures (6)

  • Figure 1: Our automated decision support system. Given an instance with a feature vector $x$, the system $\mathcal{C}$ helps the expert by automatically narrowing down the set of potential label values to a prediction set ${\mathcal{S}}(x) \subseteq \mathcal{Y}$. The system asks the expert to predict a label value $\hat{y}$ from ${\mathcal{S}}(x)$.
  • Figure 2: (Left) Confusion matrix $\bm{C}$ for the predictions made by a (simulated) human expert on their own. The label $\bar{y} = \mathop{\mathrm{argmax}}_{y'\neq y} C_{y'y}$ that is most frequently mistaken with the ground truth-label $y$ is highlighted in red for $y\in\left\{0,2,6,8\right\}$. (Right) Empirical conditional probability that a prediction set includes $\{y, \bar{y}\}$ given $Y=y$ with conformal prediction (Naive, Aps, Raps and Saps) and our greedy algorithm (Greedy). In both panels, $\gamma = 0.7$ and $\mathbb{P}(Y' = Y) = 0.7$.
  • Figure 3: Complementary cumulative distribution (cCDF) of the per-image test accuracy achieved by a simulated human expert using the prediction sets constructed with conformal prediction (Naive, Aps, Raps and Saps) and our greedy algorithm (Greedy) on the ImageNet16H dataset.
  • Figure 4: Empirical conditional probability that a prediction set includes $\{y, \bar{y}\}$ given $Y=y$ with conformal prediction (Naive, Aps, Raps and Saps) and our greedy algorithm (Greedy).
  • Figure 5: Average accuracy achieved by a simulated expert following the mixture of MNLs and by real human experts using the prediction sets constructed with all possible conformal predictors, each with a different $\alpha$ value, using the choice of calibration set by Straitouri et al. straitouri2024designing. We highlight in red the highest average accuracy for both the simulated and the real humans.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Proposition 1
  • Lemma 1
  • Lemma 2
  • Lemma 3