Table of Contents
Fetching ...

Individualised Counterfactual Examples Using Conformal Prediction Intervals

James M. Adams, Gesine Reinert, Lukasz Szpruch, Carsten Maple, Andrew Elliott

TL;DR

The paper addresses how to generate informative counterfactual explanations tailored to an individual’s knowledge about a black-box binary classifier by leveraging conformal prediction intervals to quantify uncertainty. It introduces the CPICF framework, which models an individual’s knowledge with a local classifier ${h}_{\theta_k}$ trained on ${\mathcal{T}^{(k)}}$ and selects counterfactuals by optimizing $\arg\min [L^{(\mathcal{T}^{(k)})}_{info}(X') + \lambda L_{dist}(X,X')]$ subject to $h_\theta(X) \neq h_\theta(X')$, where $L^{(\mathcal{T}^{(k)})}_{info}(X)=1/C_\alpha(X)$ and $C_\alpha(X)$ is derived from conformal prediction intervals. The uncertainty in $p_\theta(X)$ is captured via locally weighted conformal predictors (LWCP) or conformalized quantile regression (CQR), and proximity is measured with a weighted Gower distance to handle mixed data types. The method is implemented with XGBoost, PUNCC, and a pymoo-based genetic optimizer; evaluated on a synthetic hypercube and a large fraud-detection dataset, showing improved local knowledge and data augmentation performance when the trade-off parameter $\lambda$ is appropriately chosen. These results indicate CPICF’s potential to provide personalized, informative counterfactuals and to enhance model-assisted decision making in real-world, heterogeneous data settings.

Abstract

Counterfactual explanations for black-box models aim to pr ovide insight into an algorithmic decision to its recipient. For a binary classification problem an individual counterfactual details which features might be changed for the model to infer the opposite class. High-dimensional feature spaces that are typical of machine learning classification models admit many possible counterfactual examples to a decision, and so it is important to identify additional criteria to select the most useful counterfactuals. In this paper, we explore the idea that the counterfactuals should be maximally informative when considering the knowledge of a specific individual about the underlying classifier. To quantify this information gain we explicitly model the knowledge of the individual, and assess the uncertainty of predictions which the individual makes by the width of a conformal prediction interval. Regions of feature space where the prediction interval is wide correspond to areas where the confidence in decision making is low, and an additional counterfactual example might be more informative to an individual. To explore and evaluate our individualised conformal prediction interval counterfactuals (CPICFs), first we present a synthetic data set on a hypercube which allows us to fully visualise the decision boundary, conformal intervals via three different methods, and resultant CPICFs. Second, in this synthetic data set we explore the impact of a single CPICF on the knowledge of an individual locally around the original query. Finally, in both our synthetic data set and a complex real world dataset with a combination of continuous and discrete variables, we measure the utility of these counterfactuals via data augmentation, testing the performance on a held out set.

Individualised Counterfactual Examples Using Conformal Prediction Intervals

TL;DR

The paper addresses how to generate informative counterfactual explanations tailored to an individual’s knowledge about a black-box binary classifier by leveraging conformal prediction intervals to quantify uncertainty. It introduces the CPICF framework, which models an individual’s knowledge with a local classifier trained on and selects counterfactuals by optimizing subject to , where and is derived from conformal prediction intervals. The uncertainty in is captured via locally weighted conformal predictors (LWCP) or conformalized quantile regression (CQR), and proximity is measured with a weighted Gower distance to handle mixed data types. The method is implemented with XGBoost, PUNCC, and a pymoo-based genetic optimizer; evaluated on a synthetic hypercube and a large fraud-detection dataset, showing improved local knowledge and data augmentation performance when the trade-off parameter is appropriately chosen. These results indicate CPICF’s potential to provide personalized, informative counterfactuals and to enhance model-assisted decision making in real-world, heterogeneous data settings.

Abstract

Counterfactual explanations for black-box models aim to pr ovide insight into an algorithmic decision to its recipient. For a binary classification problem an individual counterfactual details which features might be changed for the model to infer the opposite class. High-dimensional feature spaces that are typical of machine learning classification models admit many possible counterfactual examples to a decision, and so it is important to identify additional criteria to select the most useful counterfactuals. In this paper, we explore the idea that the counterfactuals should be maximally informative when considering the knowledge of a specific individual about the underlying classifier. To quantify this information gain we explicitly model the knowledge of the individual, and assess the uncertainty of predictions which the individual makes by the width of a conformal prediction interval. Regions of feature space where the prediction interval is wide correspond to areas where the confidence in decision making is low, and an additional counterfactual example might be more informative to an individual. To explore and evaluate our individualised conformal prediction interval counterfactuals (CPICFs), first we present a synthetic data set on a hypercube which allows us to fully visualise the decision boundary, conformal intervals via three different methods, and resultant CPICFs. Second, in this synthetic data set we explore the impact of a single CPICF on the knowledge of an individual locally around the original query. Finally, in both our synthetic data set and a complex real world dataset with a combination of continuous and discrete variables, we measure the utility of these counterfactuals via data augmentation, testing the performance on a held out set.

Paper Structure

This paper contains 19 sections, 12 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: An overview of how the conformal prediction intervals $C_\alpha(X)$ are derived.
  • Figure 2: (A) The original classification problem with the points coloured according to their classification (orange and blue), (B) XGB classifier precision-recall curve and average precision (AP) when trained on $60\%$ of the data and tested on a $20\%$ held out fraction, and (C) the decision boundary of the classifier (dashed line).
  • Figure 3: The prediction interval width (darker colour corresponds to smaller prediction interval) for conformal predictors with $\alpha = {\color{black} 0.2}$, based on the same training set, using (A) LWCP or (B) CQR, and (C) size of the conformal prediction set based on direct binary classification The full training data $\mathcal{T}$ is overlaid (NB: white regions in (B) correspond to quantile crossing with conflicting (negative) prediction intervals).
  • Figure 4: Prediction intervals using LWCP for different training datasets, and the same calibration data set. (A1): for the sparse training data in (A2), (B1-2): ablating an area on the left, and (C1-2): ablating all points for $x_2>1$.
  • Figure 5: (A) The impact of the parameter $\lambda$ on the counterfactual selection, showing counterfactual instances for different $\lambda$ values on the $L_\textrm{info}$ objective function and (B) CPICFs for different starting instances for the same $L_\textrm{total}$ objective function.
  • ...and 3 more figures