Designing Decision Support Systems Using Counterfactual Prediction Sets

Eleni Straitouri; Manuel Gomez Rodriguez

Designing Decision Support Systems Using Counterfactual Prediction Sets

Eleni Straitouri, Manuel Gomez Rodriguez

TL;DR

The paper tackles decision-support for multiclass classification where raw predictions are imperfect by proposing prediction-set based interfaces derived from conformal predictors, forcing experts to choose within a labeled subset. It develops a causal, counterfactual framework with interventional and counterfactual monotonicity to analyze expert predictions and presents Counterfactual Successive Elimination, an online bandit algorithm with a regret of $O(\sqrt{t \log m \log T})$, that exploits the nested structure of prediction sets. A large-scale human study ($n=2{,}751$, $194{,}407$ predictions) on ImageNet16H-PS demonstrates that restricting experts to act within prediction sets improves accuracy and that counterfactual methods outperform baselines. The work provides open-source code and data, offering a practical, model-free approach to optimizing decision-support systems based on prediction sets with potential applicability to broader decision tasks and RCPS-style extensions.

Abstract

Decision support systems for classification tasks are predominantly designed to predict the value of the ground truth labels. However, since their predictions are not perfect, these systems also need to make human experts understand when and how to use these predictions to update their own predictions. Unfortunately, this has been proven challenging. In this context, it has been recently argued that an alternative type of decision support systems may circumvent this challenge. Rather than providing a single label prediction, these systems provide a set of label prediction values constructed using a conformal predictor, namely a prediction set, and forcefully ask experts to predict a label value from the prediction set. However, the design and evaluation of these systems have so far relied on stylized expert models, questioning their promise. In this paper, we revisit the design of this type of systems from the perspective of online learning and develop a methodology that does not require, nor assumes, an expert model. Our methodology leverages the nested structure of the prediction sets provided by any conformal predictor and a natural counterfactual monotonicity assumption to achieve an exponential improvement in regret in comparison to vanilla bandit algorithms. We conduct a large-scale human subject study ($n = 2{,}751$) to compare our methodology to several competitive baselines. The results show that, for decision support systems based on prediction sets, limiting experts' level of agency leads to greater performance than allowing experts to always exercise their own agency. We have made available the data gathered in our human subject study as well as an open source implementation of our system at https://github.com/Networks-Learning/counterfactual-prediction-sets.

Designing Decision Support Systems Using Counterfactual Prediction Sets

TL;DR

, that exploits the nested structure of prediction sets. A large-scale human study (

predictions) on ImageNet16H-PS demonstrates that restricting experts to act within prediction sets improves accuracy and that counterfactual methods outperform baselines. The work provides open-source code and data, offering a practical, model-free approach to optimizing decision-support systems based on prediction sets with potential applicability to broader decision tasks and RCPS-style extensions.

Abstract

) to compare our methodology to several competitive baselines. The results show that, for decision support systems based on prediction sets, limiting experts' level of agency leads to greater performance than allowing experts to always exercise their own agency. We have made available the data gathered in our human subject study as well as an open source implementation of our system at https://github.com/Networks-Learning/counterfactual-prediction-sets.

Paper Structure (14 sections, 1 theorem, 23 equations, 12 figures, 2 algorithms)

This paper contains 14 sections, 1 theorem, 23 equations, 12 figures, 2 algorithms.

Introduction
Our Contributions
Further Related Work
Decision Support Systems based on Prediction Sets
Prediction Sets through a Causal Lens
Finding the Optimal Conformal Predictor using Counterfactual Prediction Sets
Evaluation via Human Subject Study
Discussion and Limitations
Conclusions
Proof of Theorem \ref{['thm:regret']}
Additional Details about the Human Subject Study Setup
Expert Success Probability vs. Prediction Set Size
Sensitivity Analysis to Violations of the Counterfactual Monotonicity Assumption
Expert Success Probability under the Strict and Lenient Implementation of our Systems

Key Result

Theorem 1

Given a calibration set $\mathcal{D}_{\text{cal}} = \{ (x_i, y_i) \}_{i=1}^{m}$ and a maximum number of rounds $T \geq \sqrt{m}$, Counterfactual SE is guaranteed to achieve expected regret $\mathbb{E}[R(t)] \leq O \left( \sqrt{t \log m \log T} \right)$ for any $t \leq T$.

Figures (12)

Figure 1: Our automated decision support system $\mathcal{C}$. Given a sample with a feature vector $x$, the system $\mathcal{C}$ helps the expert by automatically narrowing down the set of potential label values to a subset of them $\mathcal{C}(x) \subseteq \mathcal{Y}$, which we refer to as a prediction set, using a set-valued predictor. The system forcefully asks the expert to predict a label value $\hat{y}_{\mathcal{C}}$ from the prediction set $\mathcal{C}(x)$, i.e., $\hat{y}_{\mathcal{C}} \in \mathcal{C}(x)$.
Figure 2: Empirical average regret achieved by six different bandit algorithms across $30$ different realizations. The standard error is not visible as it is always below $0.2$.
Figure 3: Empirical success probability achieved by all experts across all images using the strict and lenient implementation of our system $\mathcal{C}_{\alpha}$ with different $\alpha$ values. For the strict implementation, we annotate the optimal $\alpha$ value, the $\alpha$ values found by the algorithms by straitouri23improving [1] and by counterfactual UCB1, as well as the average success probability achieved by the set of $\alpha$ values that remain active after running counterfactual SE. For the lenient implementation, we annotate the $\alpha$ value used by babbar2022utility [2]. The average accuracy of the classifier used by both the strict and the lenient implementation of our system is $0.848$ and the empirical success probability achieved by the experts on their own is $0.760$. The shaded areas correspond to a $95\%$ confidence interval.
Figure 4: The consent form including a detailed description of the study processes that Prolific workers had to read and fill in order to participate in our human subject study. The procedures describe use of the decision support systems under the strict implementation. The consent form continues in Figures \ref{['fig:consent-b']} and \ref{['fig:consent-c']}.
Figure 5: Consent form continued.
...and 7 more figures

Theorems & Definitions (1)

Theorem 1

Designing Decision Support Systems Using Counterfactual Prediction Sets

TL;DR

Abstract

Designing Decision Support Systems Using Counterfactual Prediction Sets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (1)