Table of Contents
Fetching ...

Rethinking Distance Metrics for Counterfactual Explainability

Joshua Nathaniel Williams, Anurag Katakkar, Hoda Heidari, J. Zico Kolter

TL;DR

This work investigates a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution, and derives a distance metric, tailored for counterfactual similarity, that can be applied to a broad range of settings.

Abstract

Counterfactual explanations have been a popular method of post-hoc explainability for a variety of settings in Machine Learning. Such methods focus on explaining classifiers by generating new data points that are similar to a given reference, while receiving a more desirable prediction. In this work, we investigate a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution. Through this framing, we derive a distance metric, tailored for counterfactual similarity that can be applied to a broad range of settings. Through both quantitative and qualitative analyses of counterfactual generation methods, we show that this framing allows us to express more nuanced dependencies among the covariates.

Rethinking Distance Metrics for Counterfactual Explainability

TL;DR

This work investigates a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution, and derives a distance metric, tailored for counterfactual similarity, that can be applied to a broad range of settings.

Abstract

Counterfactual explanations have been a popular method of post-hoc explainability for a variety of settings in Machine Learning. Such methods focus on explaining classifiers by generating new data points that are similar to a given reference, while receiving a more desirable prediction. In this work, we investigate a framing for counterfactual generation methods that considers counterfactuals not as independent draws from a region around the reference, but as jointly sampled with the reference from the underlying data distribution. Through this framing, we derive a distance metric, tailored for counterfactual similarity that can be applied to a broad range of settings. Through both quantitative and qualitative analyses of counterfactual generation methods, we show that this framing allows us to express more nuanced dependencies among the covariates.

Paper Structure

This paper contains 31 sections, 51 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Comparison of approaches to counterfactual generation; counterfactuals with the proposed prior never leave the data distribution. (Black Dot) Reference, $\mathbf{x}$. (Green) Counterfactual Distribution. (Black Line) Desired predicted output, $y' = A \mathbf{x}' + b$. In all figures, $L$ is the precision of residuals, $\gamma$ is the weight on $l_{2}$ distance, and $\alpha$ controls similarity/distance in our approach.
  • Figure 2: All images have 99% certainty for the desired class based on the trained classifier. Our proposed approach produces counterfactual images that, while further from the reference images than those generated using the L2 distance, exhibit more semantically meaningful features associated with each class. Additionally, our approach avoids the class mixing observed when traversing the VAE's latent space.
  • Figure 3: Preference matrices for survey responses on each dataset. Each cell shows how often a respondent preferred the row method to the column method--darker colors imply a greater preference. Each method seems to excel on different types of data.
  • Figure 4: (a) PGM underlying counterfactual generation with a regularizer to encourage in-distribution counterfactuals. (b,c,d,e,f - Black) Reference for the counterfactual. (b,c,d,f,g,h - Black Line) Desired predicted output, $y' = 10$ for the the regression problem, $y = 2 x_{1} - 3 x_{2} + 5$. (b,c,d,e,f - Green) Distribution entailed by Eq \ref{['eq:opt_reg']}, where $L$ is the inverse variance of the residuals, and $S$ is the weighted euclidean distance between counterfactuals and reference. In all figures, $I$ is the identity matrix, the desired predicted output, $y' = 10$ (black line) and the underlying data distribution, (b,c) standard Gaussian, (d,e,f) $\mathbf{x} \sim \mathcal{N}( 00, 4.04-7.80-7.8017.00 )$.
  • Figure 5: Example MNIST counterfactuals for different distance metrics; For $\alpha \rightarrow 1$, our counterfactuals are analogous to generating counterfactuals through euclidean distance, yet importantly, as we decrease $\alpha$ ie. decrease the reliance on similarity to the reference, rather than devolving into adversarial examples, we generate images closer to prototypical examples for the desired class.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Definition 2.1: Counterfactual Explanations
  • Definition F.1: Structural Causal Model