Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory

Pasan Dissanayake; Sanghamitra Dutta

Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory

Pasan Dissanayake, Sanghamitra Dutta

TL;DR

This work analyzes how model reconstruction using counterfactuals can be improved by further leveraging the fact that the counterfactuals also lie quite close to the decision boundary and proposes a strategy for model reconstruction that is called Counterfactual Clamping Attack (CCA), which trains a surrogate model using a unique loss function that treats counterfactuals differently than ordinary instances.

Abstract

Counterfactual explanations provide ways of achieving a favorable model outcome with minimum input perturbation. However, counterfactual explanations can also be leveraged to reconstruct the model by strategically training a surrogate model to give similar predictions as the original (target) model. In this work, we analyze how model reconstruction using counterfactuals can be improved by further leveraging the fact that the counterfactuals also lie quite close to the decision boundary. Our main contribution is to derive novel theoretical relationships between the error in model reconstruction and the number of counterfactual queries required using polytope theory. Our theoretical analysis leads us to propose a strategy for model reconstruction that we call Counterfactual Clamping Attack (CCA) which trains a surrogate model using a unique loss function that treats counterfactuals differently than ordinary instances. Our approach also alleviates the related problem of decision boundary shift that arises in existing model reconstruction approaches when counterfactuals are treated as ordinary instances. Experimental results demonstrate that our strategy improves fidelity between the target and surrogate model predictions on several datasets.

Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory

TL;DR

Abstract

Paper Structure (28 sections, 19 theorems, 18 equations, 21 figures, 6 tables, 1 algorithm)

This paper contains 28 sections, 19 theorems, 18 equations, 21 figures, 6 tables, 1 algorithm.

Introduction
Preliminaries
Main Results
Convex decision boundaries and closest counterfactuals
ReLU networks and closest counterfactuals
Beyond closest counterfactuals
Experiments
Conclusion
Proof of Theoretical Results
Proof of Lemma \ref{['lem_tangent']} and Lemma \ref{['lem_closestPointNormal']}
Proof of Theorem \ref{['thm_randomPolytopeComplexity']}
Proof of Lemma \ref{['lem_cpwlDecisionBoundary']}
Proof of Theorem \ref{['thm_cpwlComplexity']}
Proof of Theorem \ref{['thm_reluLipschitzClamp']} and Corollary \ref{['cor_lipschitzClamp']}
Experimental Details and Additional Results
...and 13 more sections

Key Result

Lemma 2.6

Let $\mathcal{S}(\boldsymbol{x})=0$ and $\mathcal{T}(\boldsymbol{x})=0$ denote two differentiable hypersurfaces in $\mathbb{R}^d$, touching each other at point $\boldsymbol{w}$. Then, $\mathcal{S}(\boldsymbol{x})=0$ and $\mathcal{T}(\boldsymbol{x})=0$ have a common tangent hyperplane at $\boldsymbol

Figures (21)

Figure 1: Decision boundary shift when counterfactuals are treated as ordinary labeled points.
Figure 2: Problem setting
Figure 3: Polytope approximation of a convex decision boundary using the closest counterfactuals.
Figure 4: Approximating a concave region needs denser queries w.r.t. a convex region.
Figure 5: $\mathcal{N}_\epsilon$ grid and inverse counterfactual regions. Thick solid lines indicate the decision boundary pieces (${\mathbb{H}}_i$'s). White color depicts the accepted region. Pale-colored are the inverse counterfactual regions of the ${\mathbb{H}}_i$'s with the matching color. In this case $k(\epsilon)=7$ and $v^*(\epsilon)$ is the area of lower amber region.
...and 16 more figures

Theorems & Definitions (35)

Definition 2.1: Counterfactual Generating Mechanism
Definition 2.2: Closest Counterfactual
Definition 2.3: Inverse Counterfactual Region
Definition 2.4: Fidelity ulrichModelExtract
Definition 2.5: Hypersurface, manifoldsLee
Definition 2.6: Touching Hypersurfaces
Lemma 2.6
Lemma 3.0
Theorem 3.1
Remark 3.2: Relaxing the Convexity Assumption
...and 25 more

Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory

TL;DR

Abstract

Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (21)

Theorems & Definitions (35)