Table of Contents
Fetching ...

Missing At Random as Covariate Shift: Correcting Bias in Iterative Imputation

Luke Shannon, Song Liu, Katarzyna Reluga

TL;DR

This work addresses bias in iterative data imputation caused by MAR-induced covariate shift between observed and missing values. It reframes imputation as a risk minimisation problem and derives principled importance weights to align the training distribution with the unobserved target distribution. A bias-aware, weighted iterative imputation algorithm jointly estimates weights and conditional imputation models, using a density-ratio-based approach within a round-robin framework. Across eight diverse datasets, the proposed method reduces RMSE by up to 7% and Wasserstein distance by up to 20% relative to unweighted baselines, demonstrating practical improvements for downstream tasks while highlighting the importance of accounting for MAR in imputation.

Abstract

Accurate imputation of missing data is critical to downstream machine learning performance. We formulate missing data imputation as a risk minimisation problem, which highlights a covariate shift between the observed and unobserved data distributions. This covariate shift induced bias is not accounted for by popular imputation methods and leads to suboptimal performance. In this paper, we derive theoretically valid importance weights that correct for the induced distributional bias. Furthermore, we propose a novel imputation algorithm that jointly estimates both the importance weights and imputation models, enabling bias correction throughout the imputation process. Empirical results across benchmark datasets show reductions in root mean squared error and Wasserstein distance of up to 7% and 20%, respectively, compared to otherwise identical unweighted methods.

Missing At Random as Covariate Shift: Correcting Bias in Iterative Imputation

TL;DR

This work addresses bias in iterative data imputation caused by MAR-induced covariate shift between observed and missing values. It reframes imputation as a risk minimisation problem and derives principled importance weights to align the training distribution with the unobserved target distribution. A bias-aware, weighted iterative imputation algorithm jointly estimates weights and conditional imputation models, using a density-ratio-based approach within a round-robin framework. Across eight diverse datasets, the proposed method reduces RMSE by up to 7% and Wasserstein distance by up to 20% relative to unweighted baselines, demonstrating practical improvements for downstream tasks while highlighting the importance of accounting for MAR in imputation.

Abstract

Accurate imputation of missing data is critical to downstream machine learning performance. We formulate missing data imputation as a risk minimisation problem, which highlights a covariate shift between the observed and unobserved data distributions. This covariate shift induced bias is not accounted for by popular imputation methods and leads to suboptimal performance. In this paper, we derive theoretically valid importance weights that correct for the induced distributional bias. Furthermore, we propose a novel imputation algorithm that jointly estimates both the importance weights and imputation models, enabling bias correction throughout the imputation process. Empirical results across benchmark datasets show reductions in root mean squared error and Wasserstein distance of up to 7% and 20%, respectively, compared to otherwise identical unweighted methods.
Paper Structure (40 sections, 8 theorems, 64 equations, 7 figures, 1 table, 2 algorithms)

This paper contains 40 sections, 8 theorems, 64 equations, 7 figures, 1 table, 2 algorithms.

Key Result

Proposition 3.2

For a deterministic imputation function (Defitinion def:imputation-map), under mean squared error (MSE) loss, the risk in eq:GeneralImputationProblem admits a coordinatewise decomposition across the components of $g$, that is where

Figures (7)

  • Figure 1: The top panel illustrates a two-dimensional dataset where missingness in $X_2$ depends on $X_1$, inducing a covariate shift between observed and missing entries. Applying importance weighting (bottom panel) corrects this shift and yields an improved imputation function (top, dashed red line).
  • Figure 2: Comparison of iterative imputation methods across multiple datasets (lower is better). Bars are grouped by conditional model type: Linear, Random Forest, or Multi-Layer Perceptron. The top row shows RMSE, and the bottom row shows Wasserstein distance. Within each conditional model type, methods underlined in the legend are compared to isolates the effect of weighting on imputation performance. A $*$ indicates a statistically significant improvement according to a two-sided Wilcoxon signed-rank test ($p < 0.05$). The plot is best viewed in colour.
  • Figure 3: Comparison of our weighted imputation method to unweighted baselines across varying observed data sizes by subsampling the rows to a given size (left), missing feature count by subsampling the columns to a given size (centre), and Missingness rate by tuning $\beta_{i_k}$ as defined in Section \ref{['sec:MARSimulation']} to give a target missingness rate (right). The plot is best viewed in colour.
  • Figure 4: Performance ratio ($<1$ shows a better performance for our method) of weighted iterative methods to their unweighted counterparts, stratified by conditional model type. The inverted V-shaped trend indicates that increasing $|\alpha|$ strengthens MAR-induced covariate shift, leading to growing bias in unweighted methods, which our weighted approach is designed to mitigate.
  • Figure 5: RMSE (lower is better) of tuned XGB and Ridge regression models over varying imputation methods. Performance is reported for a held out test set. * denotes a significantly better performance of a method compared to its couterpart, tested using a paired Wilcoxon sign rank test $(p < 0.05)$. The plot is best viewed in colour.
  • ...and 2 more figures

Theorems & Definitions (22)

  • Definition 3.1: Deterministic imputation map
  • Proposition 3.2
  • Definition 3.3: Set of imputed coordinates
  • Proposition 3.4
  • Proposition 4.1: Linear optimal imputation ignores $R_{\neg i}$
  • Definition 4.2: Projection onto observed covariates
  • Definition 4.3: Class of projected functions
  • Proposition 4.4: Optimal predictor depends only on $X_{\mathrm{obs}}$
  • Definition 4.5
  • Corollary 4.6: Equivalence of optimization over observed covariates
  • ...and 12 more