Table of Contents
Fetching ...

Conditional Feature Importance revisited: Double Robustness, Efficiency and Inference

Angel Reyero-Lobo, Pierre Neuvial, Bertrand Thirion

TL;DR

This work provides a theoretical and empirical consolidation of Conditional Feature Importance (CFI), showing that Conditional Permutation Importance (CPI) is a valid CFI under proper conditional sampling. It reveals a double robustness property that aids variable selection, links CPI to the Total Sobol Index through Sobol-CPI (SCPI) for nonparametric efficiency, and offers bias corrections and valid inference procedures. The Sobol-CPI framework achieves asymptotic efficiency and consistent type-I error control, while experiments illustrate improved null-detection and competitive power without excessive computational cost. Overall, the paper strengthens the theoretical foundations of CFI and delivers practical tools for reliable variable importance assessment and inference.

Abstract

Conditional Feature Importance (CFI) was introduced long ago to account for the relationship between the studied feature and the rest of the input. However, CFI has not yet been studied from a theoretical perspective because the conditional sampling step has generally been overlooked. In this article, we demonstrate that the recent Conditional Permutation Importance (CPI) is indeed a valid implementation of this concept. Under the conditional null hypothesis, we then establish a double robustness property that can be leveraged for variable selection: with either a valid model or a valid conditional sampler, the method correctly identifies null coordinates. Under the alternative hypothesis, we study the theoretical target and link it to the popular Total Sobol Index (TSI). We introduce the Sobol-CPI, which generalizes CPI/CFI, prove that it is nonparametrically efficient, and provide a bias correction. Finally, we propose a consistent and valid type-I error test and present numerical experiments that illustrate our findings.

Conditional Feature Importance revisited: Double Robustness, Efficiency and Inference

TL;DR

This work provides a theoretical and empirical consolidation of Conditional Feature Importance (CFI), showing that Conditional Permutation Importance (CPI) is a valid CFI under proper conditional sampling. It reveals a double robustness property that aids variable selection, links CPI to the Total Sobol Index through Sobol-CPI (SCPI) for nonparametric efficiency, and offers bias corrections and valid inference procedures. The Sobol-CPI framework achieves asymptotic efficiency and consistent type-I error control, while experiments illustrate improved null-detection and competitive power without excessive computational cost. Overall, the paper strengthens the theoretical foundations of CFI and delivers practical tools for reliable variable importance assessment and inference.

Abstract

Conditional Feature Importance (CFI) was introduced long ago to account for the relationship between the studied feature and the rest of the input. However, CFI has not yet been studied from a theoretical perspective because the conditional sampling step has generally been overlooked. In this article, we demonstrate that the recent Conditional Permutation Importance (CPI) is indeed a valid implementation of this concept. Under the conditional null hypothesis, we then establish a double robustness property that can be leveraged for variable selection: with either a valid model or a valid conditional sampler, the method correctly identifies null coordinates. Under the alternative hypothesis, we study the theoretical target and link it to the popular Total Sobol Index (TSI). We introduce the Sobol-CPI, which generalizes CPI/CFI, prove that it is nonparametrically efficient, and provide a bias correction. Finally, we propose a consistent and valid type-I error test and present numerical experiments that illustrate our findings.

Paper Structure

This paper contains 47 sections, 16 theorems, 78 equations, 13 figures.

Key Result

Lemma 3.2

For a Gaussian vector $X$, additive innovation (ass:generalAssumpCondSampl) is satisfied.

Figures (13)

  • Figure 1: Double robustness for complex learners: left to right: TSI estimates for an important covariate ($X_0$) and a null covariate ($X_6$); AUC for an importance-based variable selection; bias for null covariates. Sobol-CPI converges at similar rates to TSI and, under the null, it converges faster.
  • Figure 2: Statistical Inference on variable importance in a linear setting with correlation: AUC for variable selection accuracy, bias in non-null TSI estimation, power and type-I error. Sobol-CPI(1) provides the most powerful test. Using the corrected variance, the type-I error is controlled.
  • Figure 3: Asymptotic relevance on standard ML models: Permutation Feature Importance (PFI) mean and standard deviation in parenthesis for $X_1$ (left) and $X_2$ (right) across different correlation levels and models, where $Y = X_1^2 + \epsilon, \quad \epsilon \sim \mathcal{N}(0,0.2), \quad X \sim \mathcal{N}(0,\Sigma),$ with $\Sigma_{1,2} \in \{0, 0.3, 0.6, 0.9\}$. The sample size is $n=1000$, with 80% used for training and 20% for testing. The experiment was repeated 100 times.
  • Figure 4: Double robustness of the Sobol-CPI: The empirical bias distribution of LOCO, Sobol-CPI(1), and Sobol-CPI(100). From (a) and (b), we observe the benefits of using a CPI-based approach, as its double robustness results in lower bias. In (c) and (d), we see that the estimation error for a non-null covariate is similar. Comparing (a) and (b), we observe the negative effect of data splitting.
  • Figure 5: Calibration set size effect as a trade-off between Variable Importance and Variable Selection: Total Sobol Index estimation in a nonlinear setting. The first figure represents an important covariate ($X_0$), while the second represents a non-important covariate ($X_6$). We observe that with a larger $n_\mathrm{cal}$, the importance estimation of the non-null covariate is slightly improved, enhancing variable importance. However, for the null covariate, there is a slightly greater bias, making variable selection less accurate.
  • ...and 8 more figures

Theorems & Definitions (35)

  • Definition 2.1: TSI
  • Definition 2.2: LOCO
  • Definition 2.3: CPI
  • Lemma 3.2: Gaussian additive noise
  • Proposition 3.3: Empirical conditional sampling
  • Theorem 3.6: Double robustness
  • Proposition 3.8
  • Proposition 3.9
  • Lemma 3.11
  • Lemma 3.12
  • ...and 25 more