Individualised Counterfactual Examples Using Conformal Prediction Intervals
James M. Adams, Gesine Reinert, Lukasz Szpruch, Carsten Maple, Andrew Elliott
TL;DR
The paper addresses how to generate informative counterfactual explanations tailored to an individual’s knowledge about a black-box binary classifier by leveraging conformal prediction intervals to quantify uncertainty. It introduces the CPICF framework, which models an individual’s knowledge with a local classifier ${h}_{\theta_k}$ trained on ${\mathcal{T}^{(k)}}$ and selects counterfactuals by optimizing $\arg\min [L^{(\mathcal{T}^{(k)})}_{info}(X') + \lambda L_{dist}(X,X')]$ subject to $h_\theta(X) \neq h_\theta(X')$, where $L^{(\mathcal{T}^{(k)})}_{info}(X)=1/C_\alpha(X)$ and $C_\alpha(X)$ is derived from conformal prediction intervals. The uncertainty in $p_\theta(X)$ is captured via locally weighted conformal predictors (LWCP) or conformalized quantile regression (CQR), and proximity is measured with a weighted Gower distance to handle mixed data types. The method is implemented with XGBoost, PUNCC, and a pymoo-based genetic optimizer; evaluated on a synthetic hypercube and a large fraud-detection dataset, showing improved local knowledge and data augmentation performance when the trade-off parameter $\lambda$ is appropriately chosen. These results indicate CPICF’s potential to provide personalized, informative counterfactuals and to enhance model-assisted decision making in real-world, heterogeneous data settings.
Abstract
Counterfactual explanations for black-box models aim to pr ovide insight into an algorithmic decision to its recipient. For a binary classification problem an individual counterfactual details which features might be changed for the model to infer the opposite class. High-dimensional feature spaces that are typical of machine learning classification models admit many possible counterfactual examples to a decision, and so it is important to identify additional criteria to select the most useful counterfactuals. In this paper, we explore the idea that the counterfactuals should be maximally informative when considering the knowledge of a specific individual about the underlying classifier. To quantify this information gain we explicitly model the knowledge of the individual, and assess the uncertainty of predictions which the individual makes by the width of a conformal prediction interval. Regions of feature space where the prediction interval is wide correspond to areas where the confidence in decision making is low, and an additional counterfactual example might be more informative to an individual. To explore and evaluate our individualised conformal prediction interval counterfactuals (CPICFs), first we present a synthetic data set on a hypercube which allows us to fully visualise the decision boundary, conformal intervals via three different methods, and resultant CPICFs. Second, in this synthetic data set we explore the impact of a single CPICF on the knowledge of an individual locally around the original query. Finally, in both our synthetic data set and a complex real world dataset with a combination of continuous and discrete variables, we measure the utility of these counterfactuals via data augmentation, testing the performance on a held out set.
