Interval Abstractions for Robust Counterfactual Explanations

Junqi Jiang; Francesco Leofante; Antonio Rago; Francesca Toni

Interval Abstractions for Robust Counterfactual Explanations

Junqi Jiang, Francesco Leofante, Antonio Rago, Francesca Toni

TL;DR

A novel interval abstraction technique for parametric machine learning models is proposed, which allows for provable robustness guarantees for CEs under a possibly infinite set of plausible model changes $\Delta$, and a robustness notion for CEs is formalised, which is called $\Delta$-robustness.

Abstract

Counterfactual Explanations (CEs) have emerged as a major paradigm in explainable AI research, providing recourse recommendations for users affected by the decisions of machine learning models. However, CEs found by existing methods often become invalid when slight changes occur in the parameters of the model they were generated for. The literature lacks a way to provide exhaustive robustness guarantees for CEs under model changes, in that existing methods to improve CEs' robustness are mostly heuristic, and the robustness performances are evaluated empirically using only a limited number of retrained models. To bridge this gap, we propose a novel interval abstraction technique for parametric machine learning models, which allows us to obtain provable robustness guarantees for CEs under a possibly infinite set of plausible model changes $Δ$. Based on this idea, we formalise a robustness notion for CEs, which we call $Δ$-robustness, in both binary and multi-class classification settings. We present procedures to verify $Δ$-robustness based on Mixed Integer Linear Programming, using which we further propose algorithms to generate CEs that are $Δ$-robust. In an extensive empirical study involving neural networks and logistic regression models, we demonstrate the practical applicability of our approach. We discuss two strategies for determining the appropriate hyperparameters in our method, and we quantitatively benchmark CEs generated by eleven methods, highlighting the effectiveness of our algorithms in finding robust CEs.

Interval Abstractions for Robust Counterfactual Explanations

TL;DR

, and a robustness notion for CEs is formalised, which is called

-robustness.

Abstract

. Based on this idea, we formalise a robustness notion for CEs, which we call

-robustness, in both binary and multi-class classification settings. We present procedures to verify

-robustness based on Mixed Integer Linear Programming, using which we further propose algorithms to generate CEs that are

-robust. In an extensive empirical study involving neural networks and logistic regression models, we demonstrate the practical applicability of our approach. We discuss two strategies for determining the appropriate hyperparameters in our method, and we quantitatively benchmark CEs generated by eleven methods, highlighting the effectiveness of our algorithms in finding robust CEs.

Paper Structure (35 sections, 2 theorems, 13 equations, 9 figures, 10 tables, 4 algorithms)

This paper contains 35 sections, 2 theorems, 13 equations, 9 figures, 10 tables, 4 algorithms.

Introduction
Related work
Counterfactual explanations
Robustness of counterfactual explanations
Robustness and verification in machine learning
Background
Notation
Classification models
Counterfactual explanations
Interval abstractions and robustness for binary classification
Plausible model changes $\Delta$
Interval abstractions
$\Delta$-robustness
Interval abstractions and robustness for multi-class classification
Computing $\Delta$-robustness with MILP
...and 20 more sections

Key Result

Lemma 1

Consider a classification model $\mathcal{M}_{\theta}$ and a set of plausible model shifts $\Delta$ with threshold $\delta$ and $0 \leq p \leq \infty$. Then, $\forall \mathcal{M}_{\theta'} = S(\mathcal{M}_{\theta})$, $S \in \Delta$, and $\forall \theta_i$, $\theta'_i$, $i\in [d]$, we have $\theta'_i

Figures (9)

Figure 1: Illustration of Example \ref{['ex:interval_ffnn']}.
Figure 2: Visual representation of Definition \ref{['def:classification_interval']}. In (a), $\mathcal{I}_{(\theta,\Delta)}$ classifies an input as $1$ because the output range for that input is always greater than $0.5$. In (b), the output range includes value 0.5 therefore the classification result is undefined. In (c), in a similar manner to the way in which the input in (a) is classified as $1$, the input is classified as $0$.
Figure 3: Visual representation of Definition \ref{['def:multi_classification_interval']} instantiated with three colour-coded classes {'orange', 'green', 'blue'}. In (a), $\mathcal{I}_{(\theta,\Delta)}$ classifies an input as $o$ since the output range for that class is always greater than those of any other classes. In (b), the output ranges for $o$ overlap with $b$, the classification result is therefore undefined. In (c), in a similar manner to the way in which the input in (a) is classified as $o$, $\mathcal{I}_{(\theta,\Delta)}$ classifies an input as $b$.
Figure 4: Illustration for Example \ref{['ex:interval_ffnn_multi']}
Figure 5: RNCE behaviours when $\texttt{optimal}=\texttt{F}$, $\Delta=\emptyset$ (\ref{['fig:a']}); $\texttt{optimal}=\texttt{T}$, $\Delta=\emptyset$ (\ref{['fig:b']}); $\texttt{optimal}=\texttt{F}$, $\Delta\neq\emptyset$ (\ref{['fig:c']}); $\texttt{optimal}=\texttt{T}$, $\Delta\neq\emptyset$ (\ref{['fig:d']}). Each figure depicts the CE that is chosen (green cross) among a set of candidate CEs (grey crosses) for a given input (red circle) and model $\mathcal{M}_{\theta}$. The continuous curved line represents the decision boundary of $\mathcal{M}_{\theta}$; the dashed line represents a possible change in the decision boundary under $\Delta$. Figures \ref{['fig:a']} and \ref{['fig:b']} show configurations under which RNCE may return CEs that are not $\Delta$-robust, whereas Figures \ref{['fig:c']} and \ref{['fig:d']} depict robust ones.
...and 4 more figures

Theorems & Definitions (33)

Example 1
Example 2
Definition 1
Definition 2
Example 3
Example 4
Definition 3
Example 5
Definition 4
Definition 5
...and 23 more

Interval Abstractions for Robust Counterfactual Explanations

TL;DR

Abstract

Interval Abstractions for Robust Counterfactual Explanations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (33)