Table of Contents
Fetching ...

Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees

Faisal Hamman, Erfaun Noorani, Saumitra Mishra, Daniele Magazzeni, Sanghamitra Dutta

TL;DR

The main contribution is to show that counterfactuals with sufficiently high value of $\textit{Stability}$ as defined by the measure will remain valid after potential $\textit{naturally-occurring}$ model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians).

Abstract

There is an emerging interest in generating robust counterfactual explanations that would remain valid if the model is updated or changed even slightly. Towards finding robust counterfactuals, existing literature often assumes that the original model $m$ and the new model $M$ are bounded in the parameter space, i.e., $\|\text{Params}(M){-}\text{Params}(m)\|{<}Δ$. However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed $\textit{naturally-occurring}$ model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited. Next, we propose a measure -- that we call $\textit{Stability}$ -- to quantify the robustness of counterfactuals to potential model changes for differentiable models, e.g., neural networks. Our main contribution is to show that counterfactuals with sufficiently high value of $\textit{Stability}$ as defined by our measure will remain valid after potential $\textit{naturally-occurring}$ model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians). Since our quantification depends on the local Lipschitz constant around a data point which is not always available, we also examine practical relaxations of our proposed measure and demonstrate experimentally how they can be incorporated to find robust counterfactuals for neural networks that are close, realistic, and remain valid after potential model changes. This work also has interesting connections with model multiplicity, also known as, the Rashomon effect.

Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees

TL;DR

The main contribution is to show that counterfactuals with sufficiently high value of as defined by the measure will remain valid after potential model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians).

Abstract

There is an emerging interest in generating robust counterfactual explanations that would remain valid if the model is updated or changed even slightly. Towards finding robust counterfactuals, existing literature often assumes that the original model and the new model are bounded in the parameter space, i.e., . However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited. Next, we propose a measure -- that we call -- to quantify the robustness of counterfactuals to potential model changes for differentiable models, e.g., neural networks. Our main contribution is to show that counterfactuals with sufficiently high value of as defined by our measure will remain valid after potential model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians). Since our quantification depends on the local Lipschitz constant around a data point which is not always available, we also examine practical relaxations of our proposed measure and demonstrate experimentally how they can be incorporated to find robust counterfactuals for neural networks that are close, realistic, and remain valid after potential model changes. This work also has interesting connections with model multiplicity, also known as, the Rashomon effect.
Paper Structure (24 sections, 11 theorems, 32 equations, 4 figures, 7 tables, 2 algorithms)

This paper contains 24 sections, 11 theorems, 32 equations, 4 figures, 7 tables, 2 algorithms.

Key Result

Lemma 1

For points $x_1,\ldots,x_n \in \mathcal{X}$ (lying on the data-manifold) under naturally-occurring model change, the following holds:

Figures (4)

  • Figure 1: Models can often change drastically in the parameter space causing little to no change in the actual decisions on the points on the data manifold.
  • Figure 2: Illustrates our proposed abstraction of naturally-occurring model change: The distribution of the changed model outputs $M(x)$ (stochastic) is centered around the original model output $m(x)$. The points specifically lying on the data-manifold act as anchors without much change as they exhibit lower variance in model outputs compared to points outside the manifold. This visualization also connects with the Rashomon effect, encapsulating the diverse yet similarly accurate models that can be learned from a given dataset.
  • Figure 3: Effect of stability measure on naturally-occurring model changes: (a) corresponds to the original data distribution and the trained model. (b)-(e) demonstrate some examples of changed models obtained on retraining with different weight initializations. One may notice that the model decision boundary is changing a lot in the sparse regions of the data-manifold (few data-points), possibly violating the bounded-parameter change assumption but the predictions on the dense regions of the data-manifold do not change much (in alignment with Rashomon effect). This motivates our proposed abstraction of naturally-occurring model change which allows for arbitrary changes in the parameter space with little change in the actual predictions on the dense regions of the data manifold. (f) demonstrates our proposed measure of stability $\hat{R}_{k,\sigma^2}(x,m)$ (high mean model output, low variability, almost like a Gaussian filter) for which we derive probabilistic guarantees on validity. In essence, we show that under the abstraction of naturally-occurring model change, the stability measure captures the reliable intersecting region of changed models with high probability. In the original model, we observe that certain non-robust regions (i.e., those caused by overfitting to certain data points in the original model) have higher local Lipschitz values and variability. Counterfactuals assigned to these regions (even if $m(x)$ is high) would be invalidated in the changed models. The stability measure, which samples around a region, penalizes these higher local Lipschitz values.
  • Figure 4: Histograms on the HELOC dataset to visualize the proposed stability measure.

Theorems & Definitions (27)

  • Definition 1: $\gamma-$Lipschitz
  • Definition 2: Closest Counterfactual $\mathcal{C}_{p}(x,m)$
  • Definition 3: Closest Data-Manifold Counterfactual $\mathcal{C}_{p,\mathcal{X}}(x,m)$
  • Definition 4: Local Outlier Factor breunig2000lof
  • Definition 5: Naturally-Occurring Model Change
  • Lemma 1: Connection to Roshomon Effect
  • Remark 1: Targeted Model Change
  • Definition 6: Stability
  • Remark 2: Relaxations to local Lipschitz
  • Theorem 1: Probabilistic Guarantee
  • ...and 17 more