Targeted Learning for Variable Importance
Xiaohan Wang, Yunzhe Zhou, Giles Hooker
TL;DR
This work advances uncertainty quantification for variable importance by embedding CPI within the targeted learning framework, yielding an estimator that is regular, asymptotically linear, and (under mild conditions) efficient. The authors derive efficient influence functions for CPI and related VI metrics, and introduce an iterative TL update that debiases the plug-in estimator while preserving computational practicality. Through simulations and two real-data applications (bike sharing and wine quality), the TL CPI estimator demonstrates reduced bias and improved coverage relative to traditional one-step or bootstrap approaches, at a modest cost in CI length. The results support model-agnostic, nonparametric inference for CPI and offer practical guidance for robust interpretation of variable importance in complex ML models.
Abstract
Variable importance is one of the most widely used measures for interpreting machine learning with significant interest from both statistics and machine learning communities. Recently, increasing attention has been directed toward uncertainty quantification in these metrics. Current approaches largely rely on one-step procedures, which, while asymptotically efficient, can present higher sensitivity and instability in finite sample settings. To address these limitations, we propose a novel method by employing the targeted learning (TL) framework, designed to enhance robustness in inference for variable importance metrics. Our approach is particularly suited for conditional permutation variable importance. We show that it (i) retains the asymptotic efficiency of traditional methods, (ii) maintains comparable computational complexity, and (iii) delivers improved accuracy, especially in finite sample contexts. We further support these findings with numerical experiments that illustrate the practical advantages of our method and validate the theoretical results.
