Multiple imputation and full law identifiability
Juha Karvanen, Santtu Tikka
TL;DR
The paper addresses identifiability in missing-data problems by distinguishing the target law $p(\mathbf{O}, \mathbf{X})$ from the full law $p(\mathbf{O}, \mathbf{X}, \mathbf{R})$ and proving that imputations from the correct conditionals for all patterns exist iff the full law is identifiable. It shows that standard multiple imputation is valid for estimation of the full (and thus the target) law exactly when $p(\mathbf{O}, \mathbf{X}, \mathbf{R})$ is identifiable, and otherwise requires non-standard approaches. It introduces factorizable imputation as a monotone-imputation strategy that can yield valid imputations when the target law is identifiable even if the full law is not, and it discusses employing identifying functionals when only the target law is identifiable. The paper provides extensive DAG-based examples and simulations demonstrating when MI, MIRI, FMI, or identifying functionals yield unbiased estimates, thereby guiding practitioners in choosing an appropriate missing-data method under MNAR. Overall, it links identifiability theory with practical imputation strategies, offering principled criteria for selecting imputation approaches in complex missing-data settings.
Abstract
The central challenges in missing data models concern the identifiability of two distributions: the target law and the full law. The target law refers to the joint distribution of the data variables, whereas the full law refers to the joint distribution of the data variables and their corresponding response indicators. However, the relationship between the identifiability of these two distributions and the feasibility of multiple imputation has not been clearly established. We show that imputations can be drawn from the correct conditional distributions for all possible missing data patterns if and only if the full law is identifiable. This result implies that standard multiple imputation methods -- which keep observed values unchanged and replace missing values with imputed values -- are invalid when the target law is identifiable but the full law is not. We demonstrate that alternative imputation strategies, in which certain observed values are also imputed, can enable the estimation of the target law in such cases.
