Table of Contents
Fetching ...

Generalizability vs. Counterfactual Explainability Trade-Off

Fabiano Veglianti, Flavio Giorgi, Fabrizio Silvestri, Gabriele Tolomei

TL;DR

This work introduces the $\varepsilon$-valid counterfactual probability, $p_i^{\varepsilon}$, to quantify how easily counterfactual perturbations can flip a model's prediction within an $\varepsilon$-neighborhood of each data point. The authors establish a rigorous link between $p_i^{\varepsilon}$ and the geometry of the decision boundary, deriving exact formulas for linear models and a local approximation for non-linear models, and show that the average $\bar{p}^{\varepsilon}$ increases as margins shrink, i.e., with overfitting. They argue that $\bar{p}^{\varepsilon}$ serves as a practical proxy for model generalizability, supported by empirical evaluation on Water Potability and Air Quality datasets using logistic regression and MLPs, where unregularized models exhibit higher $\bar{p}^{\varepsilon}$. The work highlights a fundamental trade-off: models with better generalization have harder-to-find counterfactuals, while overfitted models yield more accessible counterfactual explanations, offering a quantitative lens for balancing explainability and performance.

Abstract

In this work, we investigate the relationship between model generalization and counterfactual explainability in supervised learning. We introduce the notion of $\varepsilon$-valid counterfactual probability ($\varepsilon$-VCP) -- the probability of finding perturbations of a data point within its $\varepsilon$-neighborhood that result in a label change. We provide a theoretical analysis of $\varepsilon$-VCP in relation to the geometry of the model's decision boundary, showing that $\varepsilon$-VCP tends to increase with model overfitting. Our findings establish a rigorous connection between poor generalization and the ease of counterfactual generation, revealing an inherent trade-off between generalization and counterfactual explainability. Empirical results validate our theory, suggesting $\varepsilon$-VCP as a practical proxy for quantitatively characterizing overfitting.

Generalizability vs. Counterfactual Explainability Trade-Off

TL;DR

This work introduces the -valid counterfactual probability, , to quantify how easily counterfactual perturbations can flip a model's prediction within an -neighborhood of each data point. The authors establish a rigorous link between and the geometry of the decision boundary, deriving exact formulas for linear models and a local approximation for non-linear models, and show that the average increases as margins shrink, i.e., with overfitting. They argue that serves as a practical proxy for model generalizability, supported by empirical evaluation on Water Potability and Air Quality datasets using logistic regression and MLPs, where unregularized models exhibit higher . The work highlights a fundamental trade-off: models with better generalization have harder-to-find counterfactuals, while overfitted models yield more accessible counterfactual explanations, offering a quantitative lens for balancing explainability and performance.

Abstract

In this work, we investigate the relationship between model generalization and counterfactual explainability in supervised learning. We introduce the notion of -valid counterfactual probability (-VCP) -- the probability of finding perturbations of a data point within its -neighborhood that result in a label change. We provide a theoretical analysis of -VCP in relation to the geometry of the model's decision boundary, showing that -VCP tends to increase with model overfitting. Our findings establish a rigorous connection between poor generalization and the ease of counterfactual generation, revealing an inherent trade-off between generalization and counterfactual explainability. Empirical results validate our theory, suggesting -VCP as a practical proxy for quantitatively characterizing overfitting.

Paper Structure

This paper contains 19 sections, 4 theorems, 47 equations, 5 figures, 1 table.

Key Result

Lemma 3.1

Let $f_{\boldsymbol{\theta}}$ be a classifier, $\boldsymbol{x}_i \in \mathcal{X}$ a generic input, $\gamma_i$ the geometric margin from $\boldsymbol{x}_i$ to the decision boundary induced by $h_{\boldsymbol{\theta}}$, and $\varepsilon \in \mathbb{R}_{>0}$ a fixed threshold, such that $\varepsilon \g where $\mathcal{B}(\boldsymbol{x}_i, \rho) = \{\boldsymbol{x} \in \mathcal{X}: ||\boldsymbol{x} - \

Figures (5)

  • Figure 1: Distance between an input data point ($\boldsymbol{x}$) and its counterfactual example ($\widetilde{\boldsymbol{x}}$): On average, this may be higher for a well-trained model (a) than an overfitted model (b).
  • Figure 2: The $\varepsilon$-valid counterfactual probability for a sample $\boldsymbol{x} \in \mathbb{R}^2$ can be estimated as the ratio of the area of the circle centered in $\boldsymbol{x}$ with radius $\varepsilon$ that falls behind the decision boundary (in red).
  • Figure 3: Evolution of training accuracy (a), $\bar{p}^{\varepsilon}$ and $\bar{\gamma}$ (b) for logistic regression and MLP.
  • Figure 4: Comparison of the $\varepsilon$-VCP across two different datasets: (a) Water and (b) Air Quality.
  • Figure 5: Monotonicity of $g(\bar{\gamma}).$

Theorems & Definitions (15)

  • Definition 3.1: Decision Boundary
  • Definition 3.2: Geometric Margin
  • Definition 3.3: Counterfactual Example
  • Definition 3.4: $\varepsilon$-Valid Counterfactual Example -- $\varepsilon$-VCE
  • Lemma 3.1: $\varepsilon$-Valid Counterfactual Shell
  • proof
  • Definition 4.1: $\varepsilon$-Valid Counterfactual Probability -- $\varepsilon$-VCP
  • Theorem 4.1
  • proof
  • Theorem 4.2
  • ...and 5 more