Table of Contents
Fetching ...

Model-Agnostic Confidence Intervals for Feature Importance: A Fast and Powerful Approach Using Minipatch Ensembles

Luqin Gan, Lili Zheng, Genevera I. Allen

TL;DR

This work develops LOCO-MP, a model-agnostic, distribution-free framework for inferring feature importance by leveraging minipatch ensembles to perform leave-one-covariate-out evaluations without data splitting or refitting on the full dataset. It establishes asymptotic coverage for the feature-importance score Delta_j under mild assumptions, and introduces a variance-barrier safeguard to handle vanishing-variance scenarios, while also enabling distribution-free predictive inference via Jackknife+-minipatch conformal methods. The approach demonstrates favorable statistical power, computational efficiency, and robust performance in simulations and real benchmark data, including correlated-feature settings where traditional occlusion methods struggle. Together, the method provides fast, reliable uncertainty quantification for both predictions and feature importance across tabular regression and classification tasks, with broad applicability to complex scientific datasets.

Abstract

To promote new scientific discoveries from complex data sets, feature importance inference has been a long-standing statistical problem. Instead of testing for parameters that are only interpretable for specific models, there has been increasing interest in model-agnostic methods, often in the form of feature occlusion or leave-one-covariate-out (LOCO) inference. Existing approaches often make distributional assumptions, which can be difficult to verify in practice, or require model refitting and data splitting, which are computationally intensive and lead to losses in power. In this work, we develop a novel, mostly model-agnostic and distribution-free inference framework for feature importance that is computationally efficient and statistically powerful. Our approach is fast as we avoid model refitting by leveraging a form of random observation and feature subsampling called minipatch ensembles; this approach also improves statistical power by avoiding data splitting. Our framework can be applied on tabular data and with any machine learning algorithm, together with minipatch ensembles, for regression and classification tasks. Despite the dependencies induced by using minipatch ensembles, we show that our approach provides asymptotic coverage for the feature importance score of any model under mild assumptions. Finally, our same procedure can also be leveraged to provide valid confidence intervals for predictions, hence providing fast, simultaneous quantification of the uncertainty of both predictions and feature importance. We validate our intervals on a series of synthetic and real data examples, including non-linear settings, showing that our approach detects the correct important features and exhibits many computational and statistical advantages over existing methods.

Model-Agnostic Confidence Intervals for Feature Importance: A Fast and Powerful Approach Using Minipatch Ensembles

TL;DR

This work develops LOCO-MP, a model-agnostic, distribution-free framework for inferring feature importance by leveraging minipatch ensembles to perform leave-one-covariate-out evaluations without data splitting or refitting on the full dataset. It establishes asymptotic coverage for the feature-importance score Delta_j under mild assumptions, and introduces a variance-barrier safeguard to handle vanishing-variance scenarios, while also enabling distribution-free predictive inference via Jackknife+-minipatch conformal methods. The approach demonstrates favorable statistical power, computational efficiency, and robust performance in simulations and real benchmark data, including correlated-feature settings where traditional occlusion methods struggle. Together, the method provides fast, reliable uncertainty quantification for both predictions and feature importance across tabular regression and classification tasks, with broad applicability to complex scientific datasets.

Abstract

To promote new scientific discoveries from complex data sets, feature importance inference has been a long-standing statistical problem. Instead of testing for parameters that are only interpretable for specific models, there has been increasing interest in model-agnostic methods, often in the form of feature occlusion or leave-one-covariate-out (LOCO) inference. Existing approaches often make distributional assumptions, which can be difficult to verify in practice, or require model refitting and data splitting, which are computationally intensive and lead to losses in power. In this work, we develop a novel, mostly model-agnostic and distribution-free inference framework for feature importance that is computationally efficient and statistically powerful. Our approach is fast as we avoid model refitting by leveraging a form of random observation and feature subsampling called minipatch ensembles; this approach also improves statistical power by avoiding data splitting. Our framework can be applied on tabular data and with any machine learning algorithm, together with minipatch ensembles, for regression and classification tasks. Despite the dependencies induced by using minipatch ensembles, we show that our approach provides asymptotic coverage for the feature importance score of any model under mild assumptions. Finally, our same procedure can also be leveraged to provide valid confidence intervals for predictions, hence providing fast, simultaneous quantification of the uncertainty of both predictions and feature importance. We validate our intervals on a series of synthetic and real data examples, including non-linear settings, showing that our approach detects the correct important features and exhibits many computational and statistical advantages over existing methods.
Paper Structure (55 sections, 13 theorems, 115 equations, 14 figures, 7 tables, 3 algorithms)

This paper contains 55 sections, 13 theorems, 115 equations, 14 figures, 7 tables, 3 algorithms.

Key Result

Theorem 1

Suppose that all training data $(X_i, Y_i)\overset{i.i.d}{\sim}\mathcal{P}$ and Assumptions assump:Lip-assump:mpnumber hold. If the sequence of random variables $\{[h_j(X_i,Y_i)-\mathbb{E}(h_j(X_i,Y_i)]^2/\sigma_j^2\}_{i=1}^N$ is uniformly integrable, then we have where $\sigma_j^2 = \mathrm{Var}_{(X,Y)\sim \mathcal{P}}(h_j(X,Y))$ with $h_j(\cdot,\cdot)$ being defined in eq:h_j_abbrv_def.

Figures (14)

  • Figure 1: Histograms of $\Delta_j$ for a noise feature $j$, computed from simulated regression and classification data sets, under linear and non-linear data generating models and base learners. The dashed line is the Monte Carlo approximation for $\Delta_j^*$, and we can see that $\Delta_j$ is extremely close to $\Delta_j^*$, which is always negative except for the correlated feature settings. More details on the simulation set-ups and more histograms are included in the Appendix.
  • Figure 2: Coverage for the inference target \ref{['eq:target_inference']} of a null feature of 90% confidence intervals in synthetic regression data, using ridge, decision tree, and kernel SVM as the base estimators. LOCO-MP has valid coverage near 0.9.
  • Figure 3: Interval width for the inference target \ref{['eq:target_inference']} of a null feature of 90% confidence intervals in synthetic regression data, using ridge, decision tree, and kernel SVM as the base estimators. LOCO-MP has small interval width that decreases as the sample size $N$ increases.
  • Figure 4: Inference power on synthetic regression and classification data with $N=500$, $M = 200$. LOCO-MP can achieve type 1 error control under the uncorrelated linear setting (with power < 0.1 when SNR = 0), and is either the best or among the top performers in terms of statistical power across all scenarios.
  • Figure 5: Feature inference (multiplicity adjusted) on the Wine quality regression data and African Heart Disease classification data, using the random forest as the base predictor. Features whose lower bounds of confidence interval greater than zero are statistically significant. The confidence intervals with an upper bound smaller than zero indicate that such a feature would hamper prediction. LOCO-MP identifies important features that are consistent with previous studies on each data set and is among the best in terms of interval efficiency with the smallest widths.
  • ...and 9 more figures

Theorems & Definitions (21)

  • Theorem 1: Asymptotic distribution of $\bar{\Delta}_j$
  • Theorem 2: Consistent variance estimate
  • Corollary 1: Coverage and width of $\hat{\mathbb{C}}_j$
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Theorem 6: Valid inference for $\Delta_j^*$
  • Theorem 7: Distribution-free Predictive Inference Guarantee
  • Theorem 8
  • Theorem 9
  • ...and 11 more