ACE: Adapting sampling for Counterfactual Explanations
Margarita A. Guerrero, Cristian R. Rojas
TL;DR
ACE tackles the challenge of generating minimal, feasible counterfactual explanations under limited model access by reframing CFEs as a Bayesian optimization problem with a GP surrogate for a latent boundary $f$ and a penalty-based objective. It introduces an extended cost that enforces proximity, sparsity, and plausibility while respecting actionability, solved with a hybrid continuous-discrete optimizer and EI-guided sampling. Empirical results across eight real-world datasets and qualitative visualizations demonstrate superior sample efficiency, high validity, and coherent, actionable counterfactuals compared with state-of-the-art methods. This approach enables scalable, interpretable explanations for black-box classifiers in settings where query cost or availability is constrained.
Abstract
Counterfactual Explanations (CFEs) interpret machine learning models by identifying the smallest change to input features needed to change the model's prediction to a desired output. For classification tasks, CFEs determine how close a given sample is to the decision boundary of a trained classifier. Existing methods are often sample-inefficient, requiring numerous evaluations of a black-box model -- an approach that is both costly and impractical when access to the model is limited. We propose Adaptive sampling for Counterfactual Explanations (ACE), a sample-efficient algorithm combining Bayesian estimation and stochastic optimization to approximate the decision boundary with fewer queries. By prioritizing informative points, ACE minimizes evaluations while generating accurate and feasible CFEs. Extensive empirical results show that ACE achieves superior evaluation efficiency compared to state-of-the-art methods, while maintaining effectiveness in identifying minimal and actionable changes.
