Table of Contents
Fetching ...

Confident Feature Ranking

Bitya Neuhof, Yuval Benjamini

TL;DR

The paper tackles instability in post-hoc FI rankings by formalizing a base-to-global framework that separates base FI variability from global FI values. It introduces procedures to construct simultaneous confidence intervals for true feature ranks via pairwise tests and FWER control (Holm, Min-P), enabling reliable top-k selections even under correlation and nonnormality. The authors validate the approach with synthetic experiments, SHAP/TreeSHAP-based simulations, and real data (bike sharing, COMPAS, Nomao), showing high simultaneous coverage and competitive efficiency, while highlighting practical considerations like runtime and tail behavior. This work provides a principled, interpretable way to quantify ranking uncertainty in FI analyses, with implications for more stable model explanations and feature selection decisions.

Abstract

Machine learning models are widely applied in various fields. Stakeholders often use post-hoc feature importance methods to better understand the input features' contribution to the models' predictions. The interpretation of the importance values provided by these methods is frequently based on the relative order of the features (their ranking) rather than the importance values themselves. Since the order may be unstable, we present a framework for quantifying the uncertainty in global importance values. We propose a novel method for the post-hoc interpretation of feature importance values that is based on the framework and pairwise comparisons of the feature importance values. This method produces simultaneous confidence intervals for the features' ranks, which include the ``true'' (infinite sample) ranks with high probability, and enables the selection of the set of the top-k important features.

Confident Feature Ranking

TL;DR

The paper tackles instability in post-hoc FI rankings by formalizing a base-to-global framework that separates base FI variability from global FI values. It introduces procedures to construct simultaneous confidence intervals for true feature ranks via pairwise tests and FWER control (Holm, Min-P), enabling reliable top-k selections even under correlation and nonnormality. The authors validate the approach with synthetic experiments, SHAP/TreeSHAP-based simulations, and real data (bike sharing, COMPAS, Nomao), showing high simultaneous coverage and competitive efficiency, while highlighting practical considerations like runtime and tail behavior. This work provides a principled, interpretable way to quantify ranking uncertainty in FI analyses, with implications for more stable model explanations and feature selection decisions.

Abstract

Machine learning models are widely applied in various fields. Stakeholders often use post-hoc feature importance methods to better understand the input features' contribution to the models' predictions. The interpretation of the importance values provided by these methods is frequently based on the relative order of the features (their ranking) rather than the importance values themselves. Since the order may be unstable, we present a framework for quantifying the uncertainty in global importance values. We propose a novel method for the post-hoc interpretation of feature importance values that is based on the framework and pairwise comparisons of the feature importance values. This method produces simultaneous confidence intervals for the features' ranks, which include the ``true'' (infinite sample) ranks with high probability, and enables the selection of the set of the top-k important features.
Paper Structure (71 sections, 2 theorems, 10 equations, 18 figures, 4 tables, 1 algorithm)

This paper contains 71 sections, 2 theorems, 10 equations, 18 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

Let $\{ [L_1, U_1],\ldots,[L_p,U_p] \}$ be $1-\alpha$ simultaneous CIs for the true ranks. Define the top-k set$\widehat{\mathcal{T}}_k = \{j: U_j \geq p-k+1 \}$. This set includes all features with an upper bound in the top-$k$ ranks $(p, p - 1, \ldots, p - k + 1)$. Then $\mathbb{P}(\mathcal{T}_k

Figures (18)

  • Figure 1: Bar plots of SHAP values for two samples of $n=50$ observations from the bike sharing dataset using an XGBoost model. The ranking of the features is unstable for this sample size: ranking of the Workingday and Month features varies, depending on the sample. The chances of observing each of the rankings (69.4% and 29.1%) are estimated based on 1,000 independent samples of size $n=50$.
  • Figure 2: Base FI values are sampled as vectors, introducing uncertainty. The vectors are averaged to form the observed global FI values. Finally, the global FI values are ranked to produce the observed ranks.
  • Figure 3: Feature ranking and evaluation process.
  • Figure 4: Ranking efficiency as a function of $n$ for multiple $\sigma$-factors and three ranking methods. Low values mean smaller sets and are therefore better. The methods' efficiency is similar.
  • Figure 5: Ranking efficiency with low (a) and high (b) $\sigma$-factors, as a function of $n$ for multiple levels of correlations ($\rho$) and three ranking methods.
  • ...and 13 more figures

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Definition 3
  • Lemma 1
  • Definition 4
  • Theorem 1