Generalizability vs. Counterfactual Explainability Trade-Off
Fabiano Veglianti, Flavio Giorgi, Fabrizio Silvestri, Gabriele Tolomei
TL;DR
This work introduces the $\varepsilon$-valid counterfactual probability, $p_i^{\varepsilon}$, to quantify how easily counterfactual perturbations can flip a model's prediction within an $\varepsilon$-neighborhood of each data point. The authors establish a rigorous link between $p_i^{\varepsilon}$ and the geometry of the decision boundary, deriving exact formulas for linear models and a local approximation for non-linear models, and show that the average $\bar{p}^{\varepsilon}$ increases as margins shrink, i.e., with overfitting. They argue that $\bar{p}^{\varepsilon}$ serves as a practical proxy for model generalizability, supported by empirical evaluation on Water Potability and Air Quality datasets using logistic regression and MLPs, where unregularized models exhibit higher $\bar{p}^{\varepsilon}$. The work highlights a fundamental trade-off: models with better generalization have harder-to-find counterfactuals, while overfitted models yield more accessible counterfactual explanations, offering a quantitative lens for balancing explainability and performance.
Abstract
In this work, we investigate the relationship between model generalization and counterfactual explainability in supervised learning. We introduce the notion of $\varepsilon$-valid counterfactual probability ($\varepsilon$-VCP) -- the probability of finding perturbations of a data point within its $\varepsilon$-neighborhood that result in a label change. We provide a theoretical analysis of $\varepsilon$-VCP in relation to the geometry of the model's decision boundary, showing that $\varepsilon$-VCP tends to increase with model overfitting. Our findings establish a rigorous connection between poor generalization and the ease of counterfactual generation, revealing an inherent trade-off between generalization and counterfactual explainability. Empirical results validate our theory, suggesting $\varepsilon$-VCP as a practical proxy for quantitatively characterizing overfitting.
