Table of Contents
Fetching ...

CID: Measuring Feature Importance Through Counterfactual Distributions

Eddie Conti, Álvaro Parafita, Axel Brando

TL;DR

This work tackles the absence of a ground truth for local feature importance by introducing Counterfactual Importance Distribution (CID), a post-hoc method that uses positive and negative counterfactuals modeled per feature with kernel density estimation. Feature importance is derived from a distributional dissimilarity $d_1$ between the counterfactual distributions, and the authors prove that $d_1$ is a metric under suitable conditions. Empirical results on Diabetes and Heart datasets show CID provides complementary explanations to DiCE, SHAP, and LIME, with improved faithfulness as measured by comprehensiveness and sufficiency. The framework is modular and flexible, highlighting that density estimation and counterfactual generation choices shape outcomes, and it calls for pluralistic use of explanation methods in practice.

Abstract

Assessing the importance of individual features in Machine Learning is critical to understand the model's decision-making process. While numerous methods exist, the lack of a definitive ground truth for comparison highlights the need for alternative, well-founded measures. This paper introduces a novel post-hoc local feature importance method called Counterfactual Importance Distribution (CID). We generate two sets of positive and negative counterfactuals, model their distributions using Kernel Density Estimation, and rank features based on a distributional dissimilarity measure. This measure, grounded in a rigorous mathematical framework, satisfies key properties required to function as a valid metric. We showcase the effectiveness of our method by comparing with well-established local feature importance explainers. Our method not only offers complementary perspectives to existing approaches, but also improves performance on faithfulness metrics (both for comprehensiveness and sufficiency), resulting in more faithful explanations of the system. These results highlight its potential as a valuable tool for model analysis.

CID: Measuring Feature Importance Through Counterfactual Distributions

TL;DR

This work tackles the absence of a ground truth for local feature importance by introducing Counterfactual Importance Distribution (CID), a post-hoc method that uses positive and negative counterfactuals modeled per feature with kernel density estimation. Feature importance is derived from a distributional dissimilarity between the counterfactual distributions, and the authors prove that is a metric under suitable conditions. Empirical results on Diabetes and Heart datasets show CID provides complementary explanations to DiCE, SHAP, and LIME, with improved faithfulness as measured by comprehensiveness and sufficiency. The framework is modular and flexible, highlighting that density estimation and counterfactual generation choices shape outcomes, and it calls for pluralistic use of explanation methods in practice.

Abstract

Assessing the importance of individual features in Machine Learning is critical to understand the model's decision-making process. While numerous methods exist, the lack of a definitive ground truth for comparison highlights the need for alternative, well-founded measures. This paper introduces a novel post-hoc local feature importance method called Counterfactual Importance Distribution (CID). We generate two sets of positive and negative counterfactuals, model their distributions using Kernel Density Estimation, and rank features based on a distributional dissimilarity measure. This measure, grounded in a rigorous mathematical framework, satisfies key properties required to function as a valid metric. We showcase the effectiveness of our method by comparing with well-established local feature importance explainers. Our method not only offers complementary perspectives to existing approaches, but also improves performance on faithfulness metrics (both for comprehensiveness and sufficiency), resulting in more faithful explanations of the system. These results highlight its potential as a valuable tool for model analysis.

Paper Structure

This paper contains 20 sections, 5 theorems, 50 equations, 7 figures, 8 tables.

Key Result

Proposition 3.1

Let $p,q$ be two real probability distributions, $\text{supp}(p)=[a,b]$, $\text{supp}(q)=[a,c]$ such that $b<c$. Moreover, assume that Then $o(p,q) \to 1$, and so $d_1(p,q) \to 0$, as $\epsilon, \delta \to 0$.

Figures (7)

  • Figure 1: Dissimilarity analysis for the Diabetes dataset. KDEs of CF entries from $C^+$ (blue) and $C^-$ (orange) for the first test sample; the grey area indicates their overlap. This visualization highlights how feature distributions differ between positive and negative CFs. For instance, SkinThickness and PedigreeFunction show noticeable shifts, while variables like Insulin and Glucose exhibit no clear pattern.
  • Figure 2: Distribution of feature importance values over $100$ iterations, according to CID and DiCE, for the first test entry of the Diabetes and Heart datasets, respectively. We employed, respectively, LogisticRegression and RandomForestClassifier as models.
  • Figure 3: The trend of $k(x,e)$ when gradually removing the relevant features according to the different explanations in Diabetes dataset for an instance of the test set. A downward trend indicates that features are being removed that actually played a relevant role in determining the class for the instance under consideration.
  • Figure A.1: The distribution of importance values according to the various types of kernels considered for the Diabetes dataset.
  • Figure A.2: The distribution of importance values according to the various types of kernels considered for the Heart dataset.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Definition 1
  • Proposition 3.1
  • proof
  • Remark 3.1
  • Example 3.1
  • Proposition 3.2
  • Theorem 3.1
  • Theorem 3.2
  • proof
  • proof
  • ...and 2 more