Table of Contents
Fetching ...

Masking criteria for selecting an imputation model

Yanjiao Yang, Daniel Suen, Yen-Chi Chen

TL;DR

This work interrogates how to select imputation models under missing data using masking criteria. It first analyzes the conventional mask-one-out (MOO) procedure, showing its optimal target is the marginal distribution $p(x_j|x_r,r\oplus e_j)$ and that it may ignore stochasticity, motivating MOORT and MOOEN as distributional alternatives. It then introduces a likelihood-based framework (MOO likelihood) to learn imputation models from data, establishes identifiability, asymptotic normality, and BIC-based model selection, and connects masking to MAR/MCAR in monotone missing data. Across simulations and real data, MOORT and MOOEN provide robust, distributionally faithful imputation utilities while offering practical tools for comparing and learning imputation models. The results yield a practical visualization, the Prediction-Imputation diagram, to balance predictive accuracy with imputation fidelity in applied settings.

Abstract

The masking-one-out (MOO) procedure, masking an observed entry and comparing it versus its imputed values, is a very common procedure for comparing imputation models. We study the optimum of this procedure and generalize it to a missing data assumption and establish the corresponding semi-parametric efficiency theory. However, MOO is a measure of prediction accuracy, which is not ideal for evaluating an imputation model. To address this issue, we introduce three modified MOO criteria, based on rank transformation, energy distance, and likelihood principle, that allow us to select an imputation model that properly account for the stochastic nature of data. The likelihood approach further enables an elegant framework of learning an imputation model from the data and we derive its statistical and computational learning theories as well as consistency of BIC model selection. We also show how MOO is related to the missing-at-random assumption. Finally, we introduce the prediction-imputation diagram, a two-dimensional diagram visually comparing both the prediction and imputation utilities for various imputation models.

Masking criteria for selecting an imputation model

TL;DR

This work interrogates how to select imputation models under missing data using masking criteria. It first analyzes the conventional mask-one-out (MOO) procedure, showing its optimal target is the marginal distribution and that it may ignore stochasticity, motivating MOORT and MOOEN as distributional alternatives. It then introduces a likelihood-based framework (MOO likelihood) to learn imputation models from data, establishes identifiability, asymptotic normality, and BIC-based model selection, and connects masking to MAR/MCAR in monotone missing data. Across simulations and real data, MOORT and MOOEN provide robust, distributionally faithful imputation utilities while offering practical tools for comparing and learning imputation models. The results yield a practical visualization, the Prediction-Imputation diagram, to balance predictive accuracy with imputation fidelity in applied settings.

Abstract

The masking-one-out (MOO) procedure, masking an observed entry and comparing it versus its imputed values, is a very common procedure for comparing imputation models. We study the optimum of this procedure and generalize it to a missing data assumption and establish the corresponding semi-parametric efficiency theory. However, MOO is a measure of prediction accuracy, which is not ideal for evaluating an imputation model. To address this issue, we introduce three modified MOO criteria, based on rank transformation, energy distance, and likelihood principle, that allow us to select an imputation model that properly account for the stochastic nature of data. The likelihood approach further enables an elegant framework of learning an imputation model from the data and we derive its statistical and computational learning theories as well as consistency of BIC model selection. We also show how MOO is related to the missing-at-random assumption. Finally, we introduce the prediction-imputation diagram, a two-dimensional diagram visually comparing both the prediction and imputation utilities for various imputation models.

Paper Structure

This paper contains 48 sections, 16 theorems, 208 equations, 4 figures, 4 tables, 7 algorithms.

Key Result

Theorem 2.1

For an observation $(x_r,r)$, let $j \in \bar{r}$ be the index of an unobserved variable. For the missing variable $x_j$, is the optimal imputation value under the population risk $\mathcal{E}(q)$. Namely, for the observation $(x_r,r)$, the optimal imputation model will impute the missing variable $x_j$ with $\widehat{x}^*_j$ for every $j\in \bar{r}$.

Figures (4)

  • Figure 1: Prediction-Imputation (PI) Diagram comparing imputation methods (CCMV, EM, mean imputation, MICE, MMG, and nearest-neighbor hot deck) under the MOO, MOORT, and MOOEN criteria across simulation datasets. Methods closer to the lower-left region indicate lower risks and better performance.
  • Figure 2: Prediction-Imputation (PI) Diagram comparing imputation methods (CCMV, EM, mean imputation, MICE, MMG, and nearest-neighbor hot deck) under the MOO, MOORT, and MOOEN criteria on the NACC dataset for the DIGFORCT variable.
  • Figure 3: Prediction-Imputation (PI) Diagram under variable-wise MOO, MOORT, and MOOEN criteria on the NACC dataset (DIGFORCT). Each variable is the DIGFORCT score at different visits.
  • Figure 4: Prediction-Imputation (PI) Diagram comparing imputation methods (CCMV, EM, mean imputation, MICE, MMG, nearest-neighbor hot deck, and random hot deck) under MOO, MOORT, and MOOEN criteria across multiple datasets.

Theorems & Definitions (42)

  • Example
  • Theorem 2.1: Optimal imputation value of MOO
  • Example
  • Proposition 2.2
  • Theorem 2.3: Efficient influence function for marginal mean
  • Theorem 2.4: Multiple robustness
  • Example : Failure of deterministic imputation
  • Theorem 3.1: Consistency of MOORT procedure
  • Theorem 3.2: Consistency of MOOEN procedure
  • Theorem 4.1: Asymptotic normality of MOO-MLE
  • ...and 32 more