Table of Contents
Fetching ...

Imputation Uncertainty in Interpretable Machine Learning Methods

Pegah Golchian, Marvin N. Wright

TL;DR

This work addresses how missing data and imputation uncertainty influence interpretable machine learning explanations (PD, PFI, SHAP). It extends the learner-Ψ framework to include imputation uncertainty and evaluates CI coverage, width, and bias under MCAR/MAR/MNAR using both single and multiple imputation across linear and non-linear data-generating processes. The main findings show that single imputation underestimates variance and often misguides interpretation, while multiple imputation substantially improves CI coverage, albeit with wider intervals, with method choice (MICE PMM vs MICE RF) depending on the data-generating process. A real-data example confirms that ignoring imputation uncertainty can drastically alter inferred feature importance, recommending multiple imputation to enhance interpretability in the presence of missing values.

Abstract

In real data, missing values occur frequently, which affects the interpretation with interpretable machine learning (IML) methods. Recent work considers bias and shows that model explanations may differ between imputation methods, while ignoring additional imputation uncertainty and its influence on variance and confidence intervals. We therefore compare the effects of different imputation methods on the confidence interval coverage probabilities of the IML methods permutation feature importance, partial dependence plots and Shapley values. We show that single imputation leads to underestimation of variance and that, in most cases, only multiple imputation is close to nominal coverage.

Imputation Uncertainty in Interpretable Machine Learning Methods

TL;DR

This work addresses how missing data and imputation uncertainty influence interpretable machine learning explanations (PD, PFI, SHAP). It extends the learner-Ψ framework to include imputation uncertainty and evaluates CI coverage, width, and bias under MCAR/MAR/MNAR using both single and multiple imputation across linear and non-linear data-generating processes. The main findings show that single imputation underestimates variance and often misguides interpretation, while multiple imputation substantially improves CI coverage, albeit with wider intervals, with method choice (MICE PMM vs MICE RF) depending on the data-generating process. A real-data example confirms that ignoring imputation uncertainty can drastically alter inferred feature importance, recommending multiple imputation to enhance interpretability in the presence of missing values.

Abstract

In real data, missing values occur frequently, which affects the interpretation with interpretable machine learning (IML) methods. Recent work considers bias and shows that model explanations may differ between imputation methods, while ignoring additional imputation uncertainty and its influence on variance and confidence intervals. We therefore compare the effects of different imputation methods on the confidence interval coverage probabilities of the IML methods permutation feature importance, partial dependence plots and Shapley values. We show that single imputation leads to underestimation of variance and that, in most cases, only multiple imputation is close to nominal coverage.

Paper Structure

This paper contains 12 sections, 4 equations, 15 figures.

Figures (15)

  • Figure 1: The estimates of the model $\hat{f}$ and their explanations with IML methods $\widehat{\Psi}_{\hat{f}}$ (PFI, PD and SHAP) deviate from their ground truth $f$ or $\Psi_{f}$ due to learner bias, variance and Monte Carlo integration (MC). Model-$\Psi$ is the explanation of a fixed model considering Monte Carlo uncertainty, whereas learner-$\Psi$ additionally considers the model uncertainty. Imputation uncertainty as an additional source of error in the model and IML estimates is introduced, extending the illustration from Molnar et al. molnar2023relating.
  • Figure 2: Overall procedure of experiment: 1) Indroduce $m \%$ missing values according to MCAR, MAR and MNAR. 2) Impute $m$ different times. 3) Apply sampling strategy on each imputed dataset, creating $k$ different train and test sets. 4) Fit a model on each of the $m\cdot k$ training datasets. 5) Explain each model with an IML method using test data. 6) Pool the results into a point estimate with a standard error using the (un-)adjusted term for the variance (Equation \ref{['eq: adj var']}). 7) Repeat the experiment 1000 times. 8) Assess the quality of imputation uncertainty based on coverage, average CI width, and bias.
  • Figure 3: Coverage rates across the number of model refits of bootstrapped XGBoost. 40% missingness was introduced under a MAR pattern and imputed using various methods, compared to the ground truth from the complete dataset (red). Results are averaged over 1000 replicates. The black dashed line indicates the nominal coverage level of 0.95.
  • Figure 4: Average CI width across missingness proportions of bootstrapped XGBoost. The models were refitted 15 times. Missingness was introduced under a MAR pattern and imputed using various methods, compared to the ground truth from the complete dataset (red). Results are averaged over 1000 replicates.
  • Figure 5: Bias across missingness proportions of bootstrapped XGBoost. The models were refitted 15 times. Missingness was introduced under a MAR pattern and imputed using various methods. Results are averaged over 1000 replicates.
  • ...and 10 more figures