Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees

Faisal Hamman; Erfaun Noorani; Saumitra Mishra; Daniele Magazzeni; Sanghamitra Dutta

Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees

Faisal Hamman, Erfaun Noorani, Saumitra Mishra, Daniele Magazzeni, Sanghamitra Dutta

TL;DR

The main contribution is to show that counterfactuals with sufficiently high value of $\textit{Stability}$ as defined by the measure will remain valid after potential $\textit{naturally-occurring}$ model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians).

Abstract

There is an emerging interest in generating robust counterfactual explanations that would remain valid if the model is updated or changed even slightly. Towards finding robust counterfactuals, existing literature often assumes that the original model $m$ and the new model $M$ are bounded in the parameter space, i.e., $\|\text{Params}(M){-}\text{Params}(m)\|{<}Δ$. However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed $\textit{naturally-occurring}$ model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited. Next, we propose a measure -- that we call $\textit{Stability}$ -- to quantify the robustness of counterfactuals to potential model changes for differentiable models, e.g., neural networks. Our main contribution is to show that counterfactuals with sufficiently high value of $\textit{Stability}$ as defined by our measure will remain valid after potential $\textit{naturally-occurring}$ model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians). Since our quantification depends on the local Lipschitz constant around a data point which is not always available, we also examine practical relaxations of our proposed measure and demonstrate experimentally how they can be incorporated to find robust counterfactuals for neural networks that are close, realistic, and remain valid after potential model changes. This work also has interesting connections with model multiplicity, also known as, the Rashomon effect.

Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees

TL;DR

The main contribution is to show that counterfactuals with sufficiently high value of

as defined by the measure will remain valid after potential

model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians).

Abstract

and the new model

are bounded in the parameter space, i.e.,

. However, models can often change significantly in the parameter space with little to no change in their predictions or accuracy on the given dataset. In this work, we introduce a mathematical abstraction termed

model change, which allows for arbitrary changes in the parameter space such that the change in predictions on points that lie on the data manifold is limited. Next, we propose a measure -- that we call

-- to quantify the robustness of counterfactuals to potential model changes for differentiable models, e.g., neural networks. Our main contribution is to show that counterfactuals with sufficiently high value of

as defined by our measure will remain valid after potential

model changes with high probability (leveraging concentration bounds for Lipschitz function of independent Gaussians). Since our quantification depends on the local Lipschitz constant around a data point which is not always available, we also examine practical relaxations of our proposed measure and demonstrate experimentally how they can be incorporated to find robust counterfactuals for neural networks that are close, realistic, and remain valid after potential model changes. This work also has interesting connections with model multiplicity, also known as, the Rashomon effect.

Paper Structure (24 sections, 11 theorems, 32 equations, 4 figures, 7 tables, 2 algorithms)

This paper contains 24 sections, 11 theorems, 32 equations, 4 figures, 7 tables, 2 algorithms.

Introduction
Related Works
Preliminaries
Background on Counterfactuals
Main Theoretical Contributions
Naturally-Occurring Model Change
A Measure of Robustness With Probabilistic Guarantees on Validity
Proposed Measure: Stability
Probabilistic Guarantee
Practical Relaxation of Stability and Its Properties
Impossibility Under Targeted Model Change
Generating Robust Counterfactuals using Our Proposed Measure: Stability
Experiments
Discussion
Disclaimer
...and 9 more sections

Key Result

Lemma 1

For points $x_1,\ldots,x_n \in \mathcal{X}$ (lying on the data-manifold) under naturally-occurring model change, the following holds:

Figures (4)

Figure 1: Models can often change drastically in the parameter space causing little to no change in the actual decisions on the points on the data manifold.
Figure 2: Illustrates our proposed abstraction of naturally-occurring model change: The distribution of the changed model outputs $M(x)$ (stochastic) is centered around the original model output $m(x)$. The points specifically lying on the data-manifold act as anchors without much change as they exhibit lower variance in model outputs compared to points outside the manifold. This visualization also connects with the Rashomon effect, encapsulating the diverse yet similarly accurate models that can be learned from a given dataset.
Figure 3: Effect of stability measure on naturally-occurring model changes: (a) corresponds to the original data distribution and the trained model. (b)-(e) demonstrate some examples of changed models obtained on retraining with different weight initializations. One may notice that the model decision boundary is changing a lot in the sparse regions of the data-manifold (few data-points), possibly violating the bounded-parameter change assumption but the predictions on the dense regions of the data-manifold do not change much (in alignment with Rashomon effect). This motivates our proposed abstraction of naturally-occurring model change which allows for arbitrary changes in the parameter space with little change in the actual predictions on the dense regions of the data manifold. (f) demonstrates our proposed measure of stability $\hat{R}_{k,\sigma^2}(x,m)$ (high mean model output, low variability, almost like a Gaussian filter) for which we derive probabilistic guarantees on validity. In essence, we show that under the abstraction of naturally-occurring model change, the stability measure captures the reliable intersecting region of changed models with high probability. In the original model, we observe that certain non-robust regions (i.e., those caused by overfitting to certain data points in the original model) have higher local Lipschitz values and variability. Counterfactuals assigned to these regions (even if $m(x)$ is high) would be invalidated in the changed models. The stability measure, which samples around a region, penalizes these higher local Lipschitz values.
Figure 4: Histograms on the HELOC dataset to visualize the proposed stability measure.

Theorems & Definitions (27)

Definition 1: $\gamma-$Lipschitz
Definition 2: Closest Counterfactual $\mathcal{C}_{p}(x,m)$
Definition 3: Closest Data-Manifold Counterfactual $\mathcal{C}_{p,\mathcal{X}}(x,m)$
Definition 4: Local Outlier Factor breunig2000lof
Definition 5: Naturally-Occurring Model Change
Lemma 1: Connection to Roshomon Effect
Remark 1: Targeted Model Change
Definition 6: Stability
Remark 2: Relaxations to local Lipschitz
Theorem 1: Probabilistic Guarantee
...and 17 more

Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees

TL;DR

Abstract

Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (27)