Leveraging Black-box Models to Assess Feature Importance in Unconditional Distribution
Jing Zhou, Chunlin Li
TL;DR
The paper tackles the challenge of assessing feature influence on the unconditional distribution of an outcome when using pretrained black-box predictors. It defines the feature-importance curve β(τ) through a distributional, Von Mises-type expansion and provides a post hoc plug-in estimator that leverages a pretrained predictor without retraining, augmented by density extrapolation for tail estimation. A sparsification mechanism via stepwise backward pruning across a grid of quantiles yields a sparse, interpretable set of features contributing to different parts of the distribution. Empirical results on synthetic and high-dimensional data demonstrate faithful, sparse β(τ) estimates and computational efficiency, with limitations primarily under heavy-tailed error distributions.
Abstract
Understanding how changes in explanatory features affect the unconditional distribution of the outcome is important in many applications. However, existing black-box predictive models are not readily suited for analyzing such questions. In this work, we develop an approximation method to compute the feature importance curves relevant to the unconditional distribution of outcomes, while leveraging the power of pre-trained black-box predictive models. The feature importance curves measure the changes across quantiles of outcome distribution given an external impact of change in the explanatory features. Through extensive numerical experiments and real data examples, we demonstrate that our approximation method produces sparse and faithful results, and is computationally efficient.
