Counterfactual Explanations for Model Ensembles Using Entropic Risk Measures
Erfaun Noorani, Pasan Dissanayake, Faisal Hamman, Sanghamitra Dutta
TL;DR
The paper tackles the challenge of generating counterfactual explanations that remain valid across a fixed ensemble of models. It introduces an entropic risk measure with a risk-aversion knob $\theta$ to quantify the reliability of counterfactuals and constrain the optimization to balance cost and multi-model validity, unifying risk-sensitive and worst-case approaches as $\theta$ varies. The authors formulate a constrained optimization using an empirical average over the ensemble and prove that the worst-case (min-max) formulation is the limiting case when $\theta \to \infty$, providing theoretical and practical connections. Empirical results on real datasets (HELOC, German Credit, Adult) show a clear cost-Validity trade-off controlled by $\theta$ and demonstrate the method's ability to generate counterfactuals that remain valid for a larger fraction of models. Overall, the work links explainability with risk-aware, robust optimization, offering a tunable framework for reliable ensemble explanations and highlighting avenues for computationally efficient extensions.
Abstract
Counterfactual explanations indicate the smallest change in input that can translate to a different outcome for a machine learning model. Counterfactuals have generated immense interest in high-stakes applications such as finance, education, hiring, etc. In several use-cases, the decision-making process often relies on an ensemble of models rather than just one. Despite significant research on counterfactuals for one model, the problem of generating a single counterfactual explanation for an ensemble of models has received limited interest. Each individual model might lead to a different counterfactual, whereas trying to find a counterfactual accepted by all models might significantly increase cost (effort). We propose a novel strategy to find the counterfactual for an ensemble of models using the perspective of entropic risk measure. Entropic risk is a convex risk measure that satisfies several desirable properties. We incorporate our proposed risk measure into a novel constrained optimization to generate counterfactuals for ensembles that stay valid for several models. The main significance of our measure is that it provides a knob that allows for the generation of counterfactuals that stay valid under an adjustable fraction of the models. We also show that a limiting case of our entropic-risk-based strategy yields a counterfactual valid for all models in the ensemble (worst-case min-max approach). We study the trade-off between the cost (effort) for the counterfactual and its validity for an ensemble by varying degrees of risk aversion, as determined by our risk parameter knob. We validate our performance on real-world datasets.
