Table of Contents
Fetching ...

Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles

Alexandre Forel, Axel Parmentier, Thibaut Vidal

TL;DR

This work addresses the fragility of counterfactual explanations for classifiers that are implemented as randomized ensembles. It formulates robustness to algorithmic uncertainty as a probabilistic constraint and derives a simple deterministic threshold $\tau(N,\alpha)$, with $p_{N,\\alpha}^* = g_N^{-1}(\\alpha)$, that yields robust counterfactual explanations with the same computational cost as naive methods. The authors provide theoretical guarantees for ensembles of convex base learners and finite-sample bounds for convex approximations, along with practical sample-average approximations (Direct-SAA and Robust-SAA). Empirical results on real datasets show that naive counterfactuals lack robustness (validity often below $0.5$, sometimes near $0.2$), while the proposed methods deliver high robustness with modest increases in counterfactual distance, and robustness correlates with feature predictive importance. The framework offers a principled, scalable approach to provide reliable algorithmic recourse in the presence of ensemble randomness, with clear guidance on when robust counterfactual explanations are necessary.

Abstract

Counterfactual explanations describe how to modify a feature vector in order to flip the outcome of a trained classifier. Obtaining robust counterfactual explanations is essential to provide valid algorithmic recourse and meaningful explanations. We study the robustness of explanations of randomized ensembles, which are always subject to algorithmic uncertainty even when the training data is fixed. We formalize the generation of robust counterfactual explanations as a probabilistic problem and show the link between the robustness of ensemble models and the robustness of base learners. We develop a practical method with good empirical performance and support it with theoretical guarantees for ensembles of convex base learners. Our results show that existing methods give surprisingly low robustness: the validity of naive counterfactuals is below $50\%$ on most data sets and can fall to $20\%$ on problems with many features. In contrast, our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.

Don't Explain Noise: Robust Counterfactuals for Randomized Ensembles

TL;DR

This work addresses the fragility of counterfactual explanations for classifiers that are implemented as randomized ensembles. It formulates robustness to algorithmic uncertainty as a probabilistic constraint and derives a simple deterministic threshold , with , that yields robust counterfactual explanations with the same computational cost as naive methods. The authors provide theoretical guarantees for ensembles of convex base learners and finite-sample bounds for convex approximations, along with practical sample-average approximations (Direct-SAA and Robust-SAA). Empirical results on real datasets show that naive counterfactuals lack robustness (validity often below , sometimes near ), while the proposed methods deliver high robustness with modest increases in counterfactual distance, and robustness correlates with feature predictive importance. The framework offers a principled, scalable approach to provide reliable algorithmic recourse in the presence of ensemble randomness, with clear guidance on when robust counterfactual explanations are necessary.

Abstract

Counterfactual explanations describe how to modify a feature vector in order to flip the outcome of a trained classifier. Obtaining robust counterfactual explanations is essential to provide valid algorithmic recourse and meaningful explanations. We study the robustness of explanations of randomized ensembles, which are always subject to algorithmic uncertainty even when the training data is fixed. We formalize the generation of robust counterfactual explanations as a probabilistic problem and show the link between the robustness of ensemble models and the robustness of base learners. We develop a practical method with good empirical performance and support it with theoretical guarantees for ensembles of convex base learners. Our results show that existing methods give surprisingly low robustness: the validity of naive counterfactuals is below on most data sets and can fall to on problems with many features. In contrast, our method achieves high robustness with only a small increase in the distance from counterfactual explanations to their initial observations.
Paper Structure (29 sections, 8 theorems, 34 equations, 12 figures, 3 tables)

This paper contains 29 sections, 8 theorems, 34 equations, 12 figures, 3 tables.

Key Result

lemma thmcounterlemma

Given $N \in \mathbb{N}$, the map $g_N: [0, 1] \to [0, 1], p \mapsto B\left(N/2 ; N, p\right)$ is decreasing and invertible.

Figures (12)

  • Figure 1: Sensitivity of the robustness threshold $p_{N, \alpha}^*$.
  • Figure 2: Initial observation and counterfactual explanations for increasing robustness target ($1-\alpha$).
  • Figure 3: Validity of robust counterfactuals as a function of the robustness target $(1-\alpha)$.
  • Figure 4: Trade-off between the distance and robustness of counterfactual explanations.
  • Figure 5: Average number of features changed for varying robustness targets $(1-\alpha)$.
  • ...and 7 more figures

Theorems & Definitions (11)

  • definition thmcounterdefinition: Validity
  • definition thmcounterdefinition: Algorithmic robustness
  • lemma thmcounterlemma
  • proposition thmcounterproposition
  • proposition thmcounterproposition
  • lemma thmcounterlemma
  • proposition thmcounterproposition
  • proposition thmcounterproposition: Asymptotic consistency
  • proposition thmcounterproposition: Finite-sample guarantees
  • lemma thmcounterlemma
  • ...and 1 more