Marginal and Conditional Importance Measures from Machine Learning Models and Their Relationship with Conditional Average Treatment Effect
Mohammad Kaviul Anam Khan, Olli Saarela, Rafal Kustra
TL;DR
This work tackles the challenge of interpreting black-box predictors by introducing MVIM, a model-agnostic metric based on the true conditional expectation $f_0$, and CVIM, a conditional permutation-based counterpart designed to mitigate correlation bias. MVIM can be expressed as a quadratic function of the conditional average treatment effect (CATE) for multinomial and continuous treatments, linking prediction importance to causal structure; however, its estimation suffers from bias when predictors are correlated due to extrapolation in low-density regions. The authors develop a bias-variance decomposition, introduce a delta term, and show CVIM (and the adjusted AMVIM) reduces sensitivity to predictor correlations and signals near-positivity violations, with CVIM converging faster than MVIM in simulations. Collectively, MVIM, CVIM, and AMVIM provide a model-agnostic, causally interpretable suite of importance measures applicable to binary, multinomial, and continuous treatments, offering practical guidance for robust variable importance under correlation and causality concerns.
Abstract
Interpreting black-box machine learning models is challenging due to their strong dependence on data and inherently non-parametric nature. This paper reintroduces the concept of importance through "Marginal Variable Importance Metric" (MVIM), a model-agnostic measure of predictor importance based on the true conditional expectation function. MVIM evaluates predictors' influence on continuous or discrete outcomes. A permutation-based estimation approach, inspired by \citet{breiman2001random} and \citet{fisher2019all}, is proposed to estimate MVIM. MVIM estimator is biased when predictors are highly correlated, as black-box models struggle to extrapolate in low-probability regions. To address this, we investigated the bias-variance decomposition of MVIM to understand the source and pattern of the bias under high correlation. A Conditional Variable Importance Metric (CVIM), adapted from \citet{strobl2008conditional}, is introduced to reduce this bias. Both MVIM and CVIM exhibit a quadratic relationship with the conditional average treatment effect (CATE).
