Table of Contents
Fetching ...

A general framework for inference on algorithm-agnostic variable importance

Brian D. Williamson, Peter B. Gilbert, Noah R. Simon, Marco Carone

TL;DR

This work delivers a unified, model-agnostic framework to quantify intrinsic variable importance via a population-level contrast in oracle predictiveness, $\\psi_{0,s}=V(f_0,P_0)-V(f_{0,s},P_0)$, independent of any single prediction algorithm. It develops nonparametric, efficient plug-in estimators for a broad class of predictiveness measures, demonstrates asymptotic efficiency under regularity, and introduces cross-fitting to accommodate flexible machine-learning-based nuisance estimation. The authors further provide strategies for valid inference under zero-importance and extend the approach to complex settings, including causal and missing-data scenarios. Through simulations, they show favorable operating characteristics, and they illustrate the framework by analyzing HIV-1 antibody resistance features, revealing which feature groups most drive predictive performance. Overall, the framework enables robust, algorithm-agnostic assessment of variable importance with concrete guidance for inference and practical deployment in high-stakes scientific studies.

Abstract

In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment does not necessarily characterize the prediction potential of features, and may provide a misleading reflection of the intrinsic value of these features. To address this limitation, we propose a general framework for nonparametric inference on interpretable algorithm-agnostic variable importance. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. We propose a nonparametric efficient estimation procedure that allows the construction of valid confidence intervals, even when machine learning techniques are used. We also outline a valid strategy for testing the null importance hypothesis. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.

A general framework for inference on algorithm-agnostic variable importance

TL;DR

This work delivers a unified, model-agnostic framework to quantify intrinsic variable importance via a population-level contrast in oracle predictiveness, , independent of any single prediction algorithm. It develops nonparametric, efficient plug-in estimators for a broad class of predictiveness measures, demonstrates asymptotic efficiency under regularity, and introduces cross-fitting to accommodate flexible machine-learning-based nuisance estimation. The authors further provide strategies for valid inference under zero-importance and extend the approach to complex settings, including causal and missing-data scenarios. Through simulations, they show favorable operating characteristics, and they illustrate the framework by analyzing HIV-1 antibody resistance features, revealing which feature groups most drive predictive performance. Overall, the framework enables robust, algorithm-agnostic assessment of variable importance with concrete guidance for inference and practical deployment in high-stakes scientific studies.

Abstract

In many applications, it is of interest to assess the relative contribution of features (or subsets of features) toward the goal of predicting a response -- in other words, to gauge the variable importance of features. Most recent work on variable importance assessment has focused on describing the importance of features within the confines of a given prediction algorithm. However, such assessment does not necessarily characterize the prediction potential of features, and may provide a misleading reflection of the intrinsic value of these features. To address this limitation, we propose a general framework for nonparametric inference on interpretable algorithm-agnostic variable importance. We define variable importance as a population-level contrast between the oracle predictiveness of all available features versus all features except those under consideration. We propose a nonparametric efficient estimation procedure that allows the construction of valid confidence intervals, even when machine learning techniques are used. We also outline a valid strategy for testing the null importance hypothesis. Through simulations, we show that our proposal has good operating characteristics, and we illustrate its use with data from a study of an antibody against HIV-1 infection.

Paper Structure

This paper contains 38 sections, 3 theorems, 60 equations, 23 figures, 6 tables, 3 algorithms.

Key Result

Theorem 1

If conditions (A1)--(A2) and (B1)--(B3) hold, then $v_n$ is an asymptotically linear estimator of $v_0$ with influence function equal to $\phi_{0}: z \mapsto \dot{V}(f_0,P_0; \delta_z - P_0)$, that is, under sampling from $P_0$. If conditions (A3)--(A4) also hold, then $\phi_0$ coincides with the nonparametric efficient influence function (EIF) of $P\mapsto V(f_P,P)$ at $P_0$, and so, $v_n$ is no

Figures (23)

  • Figure 1: Illustration of dataset subdivision when sample-splitting and cross-fitting are used simultaneously for valid inference under the zero-importance hypothesis (sample-splitting) without requiring Donsker class conditions (cross-fitting). Each row represents the entire dataset with a different subset singled out (in grey) as testing set. To estimate $v_0$, the top three rows are used. In each such row, $f_0$ is estimated using data in the white cells, and $v_0$ is estimated using the resulting estimate of $f_0$ and data in the grey cells. Row-specific estimates of $v_0$ are then averaged. The process is repeated for estimating $v_{0,s}$ but instead using the bottom three rows and estimating $f_{0,s}$ rather than $f_0$.
  • Figure 2: Performance of plug-in estimators for estimating (non-zero) importance of $X_2$ in terms of accuracy under Scenario 1 (all features have non-zero importance). Clockwise from top left: empirical bias of the proposed plug-in estimator scaled by $n^{1/2}$; empirical variance scaled by $n$; empirical coverage of nominal 95% confidence intervals; and average width of these intervals. Circles, triangles, squares and plus symbols denote estimators based on the use of generalized additive models (GAMs), probit regression (GLM), random forests (RF), and the Super Learner (SL), respectively. Blue and green symbols denote non-cross-fitted and cross-fitted estimators, respectively.
  • Figure 3: Performance of plug-in estimators for estimating (zero) importance of $X_3$ in terms of accuracy under Scenario 2. Clockwise from top left: empirical bias of the proposed plug-in estimator scaled by $n^{1/2}$; empirical variance scaled by $n$; empirical coverage of nominal 95% confidence intervals; and empirical type I error of the proposed hypothesis test. Circles, triangles, squares and plus symbols denote estimators based on the use of generalized additive models (GAMs), probit regression (GLM), random forests (RF), and the Super Learner (SL), respectively. Blue and green symbols denote non-cross-fitted and cross-fitted estimators, respectively.
  • Figure 4: Variable importance measured by accuracy (panel A) and AUC (panel B) for the groups defined in panel C. Stars denote importance deemed statistically significantly different from zero at the 0.0038 (0.05 / 13) level.
  • Figure 5: Performance of plug-in estimators for estimating (non-zero) importance of $X_1$ in terms of accuracy under Scenario 1 (all features have non-zero importance). Clockwise from top left: empirical bias of the proposed plug-in estimator scaled by $n^{1/2}$; empirical variance scaled by $n$; empirical coverage of nominal 95% confidence intervals; and width of these intervals. Circles, triangles, squares, and plus symbols denote estimators based on the use of generalized additive models (GAMs), probit regression (GLM), random forests (RF) or the Super Learner (SL), respectively. Blue and green symbols denote non-cross-fitted and cross-fitted estimators, respectively. This figure appears in color in the electronic version of this article.
  • ...and 18 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Theorem 3