Table of Contents
Fetching ...

Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations

Luca Marzari, Francesco Leofante, Ferdinando Cicalese, Alessandro Farinelli

TL;DR

This paper addresses the robustness of counterfactual explanations for deep neural networks under plausible model shifts (PMS). It proves that exact robustness certification under PMS is NP-hard and introduces AP$\Delta$S, a scalable probabilistic certification method based on Monte Carlo sampling and Wilks bounds to estimate the fraction of PMS that preserve a counterfactual with high confidence. The authors show that PMS-focused probabilistic guarantees can outperform worst-case approaches, both in theory and in experiments on four binary datasets, by producing robust, plausible CFXs with better proximity. The work also provides a practical framework for generating robust explanations and discusses extensions to exact enumeration and MILP-based certification for completeness.

Abstract

We study the problem of assessing the robustness of counterfactual explanations for deep learning models. We focus on $\textit{plausible model shifts}$ altering model parameters and propose a novel framework to reason about the robustness property in this setting. To motivate our solution, we begin by showing for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. As this (practically) rules out the existence of scalable algorithms for exactly computing robustness, we propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees while preserving scalability. Remarkably, and differently from existing solutions targeting plausible model shifts, our approach does not impose requirements on the network to be analyzed, thus enabling robustness analysis on a wider range of architectures. Experiments on four binary classification datasets indicate that our method improves the state of the art in generating robust explanations, outperforming existing methods on a range of metrics.

Rigorous Probabilistic Guarantees for Robust Counterfactual Explanations

TL;DR

This paper addresses the robustness of counterfactual explanations for deep neural networks under plausible model shifts (PMS). It proves that exact robustness certification under PMS is NP-hard and introduces APS, a scalable probabilistic certification method based on Monte Carlo sampling and Wilks bounds to estimate the fraction of PMS that preserve a counterfactual with high confidence. The authors show that PMS-focused probabilistic guarantees can outperform worst-case approaches, both in theory and in experiments on four binary datasets, by producing robust, plausible CFXs with better proximity. The work also provides a practical framework for generating robust explanations and discusses extensions to exact enumeration and MILP-based certification for completeness.

Abstract

We study the problem of assessing the robustness of counterfactual explanations for deep learning models. We focus on altering model parameters and propose a novel framework to reason about the robustness property in this setting. To motivate our solution, we begin by showing for the first time that computing the robustness of counterfactuals with respect to plausible model shifts is NP-complete. As this (practically) rules out the existence of scalable algorithms for exactly computing robustness, we propose a novel probabilistic approach which is able to provide tight estimates of robustness with strong guarantees while preserving scalability. Remarkably, and differently from existing solutions targeting plausible model shifts, our approach does not impose requirements on the network to be analyzed, thus enabling robustness analysis on a wider range of architectures. Experiments on four binary classification datasets indicate that our method improves the state of the art in generating robust explanations, outperforming existing methods on a range of metrics.
Paper Structure (21 sections, 10 theorems, 6 equations, 10 figures, 2 tables, 4 algorithms)

This paper contains 21 sections, 10 theorems, 6 equations, 10 figures, 2 tables, 4 algorithms.

Key Result

Theorem 1

Deciding DRP is NP-complete.

Figures (10)

  • Figure 1: (a) The model $\mathcal{M}_{\theta}$ used as an example to prove the lemma. (b) An interval neural network representing the realizations that can be obtained from $\mathcal{M}_{\theta}$ considering a set of PMS $\Delta_{\delta}$ with $\delta = 0.3$.
  • Figure 2: Visual representation of the possible output reachable set for an interval abstraction for a binary classification model. (a) For a given ${\Delta}$, we classify an input as $1$ (robust) if the output range for that input is always greater $0.5$. Otherwise, the input is classified as $0$, i.e., not robust (b),(c).
  • Figure 3: The interval neural network used for exact enumeration.
  • Figure 4: Average robust $\delta$ obtained using MILP and AP$\Delta$S .
  • Figure 5: The DNN considered in this proof
  • ...and 5 more figures

Theorems & Definitions (20)

  • Definition 1
  • Definition 2: Jiang_Leofante_Rago_Toni_2023
  • Definition 3: Jiang_Leofante_Rago_Toni_2023
  • Definition 4
  • Definition 5: Hammanetal23 (NOMS)
  • Definition 6: Jiang_Leofante_Rago_Toni_2023 (PMS)
  • Definition 7
  • Theorem 1
  • Theorem 2
  • Lemma 3
  • ...and 10 more