S-CFE: Simple Counterfactual Explanations
Shpresim Sadiku, Moritz Wagner, Sai Ganesh Nagarajan, Sebastian Pokutta
TL;DR
S-CFE tackles the challenge of generating counterfactual explanations that are both sparse and aligned with the data distribution by framing CFEs as a non-convex, non-smooth optimization problem and solving it with an Accelerated Proximal Gradient method. The approach relaxes hard constraints into differentiable penalties, enabling the integration of diverse plausibility measures (KDE, GMM, kNN-based LOF/density gravity) and exact sparsity control via a proximal step, while supporting box-constrained features. Empirically, S-CFE delivers highly valid CFEs that are close to the factual data, sparse in feature changes, and remain within high-density regions of the target class with competitive runtime across datasets (Boston Housing, Wine, Breast Cancer, MNIST) and models (MLPs and CNNs). The work demonstrates a practical and flexible tool for interpretable explanations in safety-critical domains, while acknowledging that sparsity and plausibility do not guarantee real-world interventions or causal improvements, and outlining directions for future work toward training directly on data distributions and predictions.
Abstract
We study the problem of finding optimal sparse, manifold-aligned counterfactual explanations for classifiers. Canonically, this can be formulated as an optimization problem with multiple non-convex components, including classifier loss functions and manifold alignment (or \emph{plausibility}) metrics. The added complexity of enforcing \emph{sparsity}, or shorter explanations, complicates the problem further. Existing methods often focus on specific models and plausibility measures, relying on convex $\ell_1$ regularizers to enforce sparsity. In this paper, we tackle the canonical formulation using the accelerated proximal gradient (APG) method, a simple yet efficient first-order procedure capable of handling smooth non-convex objectives and non-smooth $\ell_p$ (where $0 \leq p < 1$) regularizers. This enables our approach to seamlessly incorporate various classifiers and plausibility measures while producing sparser solutions. Our algorithm only requires differentiable data-manifold regularizers and supports box constraints for bounded feature ranges, ensuring the generated counterfactuals remain \emph{actionable}. Finally, experiments on real-world datasets demonstrate that our approach effectively produces sparse, manifold-aligned counterfactual explanations while maintaining proximity to the factual data and computational efficiency.
