Table of Contents
Fetching ...

S-CFE: Simple Counterfactual Explanations

Shpresim Sadiku, Moritz Wagner, Sai Ganesh Nagarajan, Sebastian Pokutta

TL;DR

S-CFE tackles the challenge of generating counterfactual explanations that are both sparse and aligned with the data distribution by framing CFEs as a non-convex, non-smooth optimization problem and solving it with an Accelerated Proximal Gradient method. The approach relaxes hard constraints into differentiable penalties, enabling the integration of diverse plausibility measures (KDE, GMM, kNN-based LOF/density gravity) and exact sparsity control via a proximal step, while supporting box-constrained features. Empirically, S-CFE delivers highly valid CFEs that are close to the factual data, sparse in feature changes, and remain within high-density regions of the target class with competitive runtime across datasets (Boston Housing, Wine, Breast Cancer, MNIST) and models (MLPs and CNNs). The work demonstrates a practical and flexible tool for interpretable explanations in safety-critical domains, while acknowledging that sparsity and plausibility do not guarantee real-world interventions or causal improvements, and outlining directions for future work toward training directly on data distributions and predictions.

Abstract

We study the problem of finding optimal sparse, manifold-aligned counterfactual explanations for classifiers. Canonically, this can be formulated as an optimization problem with multiple non-convex components, including classifier loss functions and manifold alignment (or \emph{plausibility}) metrics. The added complexity of enforcing \emph{sparsity}, or shorter explanations, complicates the problem further. Existing methods often focus on specific models and plausibility measures, relying on convex $\ell_1$ regularizers to enforce sparsity. In this paper, we tackle the canonical formulation using the accelerated proximal gradient (APG) method, a simple yet efficient first-order procedure capable of handling smooth non-convex objectives and non-smooth $\ell_p$ (where $0 \leq p < 1$) regularizers. This enables our approach to seamlessly incorporate various classifiers and plausibility measures while producing sparser solutions. Our algorithm only requires differentiable data-manifold regularizers and supports box constraints for bounded feature ranges, ensuring the generated counterfactuals remain \emph{actionable}. Finally, experiments on real-world datasets demonstrate that our approach effectively produces sparse, manifold-aligned counterfactual explanations while maintaining proximity to the factual data and computational efficiency.

S-CFE: Simple Counterfactual Explanations

TL;DR

S-CFE tackles the challenge of generating counterfactual explanations that are both sparse and aligned with the data distribution by framing CFEs as a non-convex, non-smooth optimization problem and solving it with an Accelerated Proximal Gradient method. The approach relaxes hard constraints into differentiable penalties, enabling the integration of diverse plausibility measures (KDE, GMM, kNN-based LOF/density gravity) and exact sparsity control via a proximal step, while supporting box-constrained features. Empirically, S-CFE delivers highly valid CFEs that are close to the factual data, sparse in feature changes, and remain within high-density regions of the target class with competitive runtime across datasets (Boston Housing, Wine, Breast Cancer, MNIST) and models (MLPs and CNNs). The work demonstrates a practical and flexible tool for interpretable explanations in safety-critical domains, while acknowledging that sparsity and plausibility do not guarantee real-world interventions or causal improvements, and outlining directions for future work toward training directly on data distributions and predictions.

Abstract

We study the problem of finding optimal sparse, manifold-aligned counterfactual explanations for classifiers. Canonically, this can be formulated as an optimization problem with multiple non-convex components, including classifier loss functions and manifold alignment (or \emph{plausibility}) metrics. The added complexity of enforcing \emph{sparsity}, or shorter explanations, complicates the problem further. Existing methods often focus on specific models and plausibility measures, relying on convex regularizers to enforce sparsity. In this paper, we tackle the canonical formulation using the accelerated proximal gradient (APG) method, a simple yet efficient first-order procedure capable of handling smooth non-convex objectives and non-smooth (where ) regularizers. This enables our approach to seamlessly incorporate various classifiers and plausibility measures while producing sparser solutions. Our algorithm only requires differentiable data-manifold regularizers and supports box constraints for bounded feature ranges, ensuring the generated counterfactuals remain \emph{actionable}. Finally, experiments on real-world datasets demonstrate that our approach effectively produces sparse, manifold-aligned counterfactual explanations while maintaining proximity to the factual data and computational efficiency.

Paper Structure

This paper contains 27 sections, 20 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Examples of possible CFEs for an input image of the digit 9 when changing the classification to 4: Sparsity constraints alone produce adversarial examples, while plausibility constraints lead to unrealistic CFEs. Combining both yields CFEs that are sparse yet aligned with the target class 4's data manifold.
  • Figure 2: A simple dataset illustrates the need for a plausibility term in CFE algorithms. (a) Our S-CFE method without a plausibility term generates CFEs near the factual blue data points, but they remain distant from the distribution of correctly classified orange data points. (b) Our S-CFE$_{\textnormal{KDE}}$ method, which combines S-CFE with a KDE-based plausibility term, produces CFEs within high-density regions. (c) Similarly, S-CFE$_{\textnormal{kNN}}$, combining S-CFE with a $k-$NN-based plausibility term generates CFEs near the boundary of high-density regions. The green trajectory connecting the green data points represents the iterates of our S-CFE algorithm. The dashed black line represents the decision boundary of a linear classifier.
  • Figure 3: Robustness of the different methods. The distance of the input data points to the original data points on the $x$-axis and the distance of the generated CFEs to the CFE generated from the original data points on the $y$-axis. Tested on 100 data points from each data set.
  • Figure 4: A toy example illustrating the positioning of convex combinations of $k-$NNs obtained via density gravity relative to the original points.
  • Figure 5: Example reproduced from tsiourvas2024manifold. Compares geometrically MIP-Live-m=1 tsiourvas2024manifold vs. our S-CFE approach. The generated CFE of our method resides in a high-density region and is sparse. MIP-Live-m=1 considerably restricts the working space - the small bounded red region, and uses only 1 neighbor for the LOF manifold adhering constraint.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Definition 3.1: parikh2014proximal
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4: Local Outlier Factor (LOF)
  • Remark 4.1
  • Definition A.1
  • Definition A.2
  • Definition A.3