Table of Contents
Fetching ...

Multiple imputation and full law identifiability

Juha Karvanen, Santtu Tikka

TL;DR

The paper addresses identifiability in missing-data problems by distinguishing the target law $p(\mathbf{O}, \mathbf{X})$ from the full law $p(\mathbf{O}, \mathbf{X}, \mathbf{R})$ and proving that imputations from the correct conditionals for all patterns exist iff the full law is identifiable. It shows that standard multiple imputation is valid for estimation of the full (and thus the target) law exactly when $p(\mathbf{O}, \mathbf{X}, \mathbf{R})$ is identifiable, and otherwise requires non-standard approaches. It introduces factorizable imputation as a monotone-imputation strategy that can yield valid imputations when the target law is identifiable even if the full law is not, and it discusses employing identifying functionals when only the target law is identifiable. The paper provides extensive DAG-based examples and simulations demonstrating when MI, MIRI, FMI, or identifying functionals yield unbiased estimates, thereby guiding practitioners in choosing an appropriate missing-data method under MNAR. Overall, it links identifiability theory with practical imputation strategies, offering principled criteria for selecting imputation approaches in complex missing-data settings.

Abstract

The central challenges in missing data models concern the identifiability of two distributions: the target law and the full law. The target law refers to the joint distribution of the data variables, whereas the full law refers to the joint distribution of the data variables and their corresponding response indicators. However, the relationship between the identifiability of these two distributions and the feasibility of multiple imputation has not been clearly established. We show that imputations can be drawn from the correct conditional distributions for all possible missing data patterns if and only if the full law is identifiable. This result implies that standard multiple imputation methods -- which keep observed values unchanged and replace missing values with imputed values -- are invalid when the target law is identifiable but the full law is not. We demonstrate that alternative imputation strategies, in which certain observed values are also imputed, can enable the estimation of the target law in such cases.

Multiple imputation and full law identifiability

TL;DR

The paper addresses identifiability in missing-data problems by distinguishing the target law from the full law and proving that imputations from the correct conditionals for all patterns exist iff the full law is identifiable. It shows that standard multiple imputation is valid for estimation of the full (and thus the target) law exactly when is identifiable, and otherwise requires non-standard approaches. It introduces factorizable imputation as a monotone-imputation strategy that can yield valid imputations when the target law is identifiable even if the full law is not, and it discusses employing identifying functionals when only the target law is identifiable. The paper provides extensive DAG-based examples and simulations demonstrating when MI, MIRI, FMI, or identifying functionals yield unbiased estimates, thereby guiding practitioners in choosing an appropriate missing-data method under MNAR. Overall, it links identifiability theory with practical imputation strategies, offering principled criteria for selecting imputation approaches in complex missing-data settings.

Abstract

The central challenges in missing data models concern the identifiability of two distributions: the target law and the full law. The target law refers to the joint distribution of the data variables, whereas the full law refers to the joint distribution of the data variables and their corresponding response indicators. However, the relationship between the identifiability of these two distributions and the feasibility of multiple imputation has not been clearly established. We show that imputations can be drawn from the correct conditional distributions for all possible missing data patterns if and only if the full law is identifiable. This result implies that standard multiple imputation methods -- which keep observed values unchanged and replace missing values with imputed values -- are invalid when the target law is identifiable but the full law is not. We demonstrate that alternative imputation strategies, in which certain observed values are also imputed, can enable the estimation of the target law in such cases.

Paper Structure

This paper contains 9 sections, 3 theorems, 21 equations, 2 figures, 3 tables.

Key Result

Theorem 2.4

Let $\mathcal{M}_\Omega$ be a nonparametric missing data model. A conditionally complete imputation method $\xi(\Omega, p(\mathbf{O}, \mathbf{X}^\ast, \mathbf{R}))$ exists if and only if the full law $p(\mathbf{O}, \mathbf{X}, \mathbf{R})$ is identifiable from $(\Omega, p(\mathbf{O}, \mathbf{X}^\ast

Figures (2)

  • Figure 1: Graphs used for the two-variable examples. The full law and the target law are identifiable in graphs (a), (b) and (c). The target law is identifiable also in graphs (d) and (f) but not in graph (e). Proxy variables and edges related to them are not shown for simplicity.
  • Figure 2: Panel (a) shows the graph for Example 4 where the full law is not identifiable but a factorizable imputation method can be used in the ordering $Z \prec W \prec X \prec Y$. Panel (b) shows the graph for Example 5 where the full law is not identifiable, the target law is identifiable, but the conditions of Theorem \ref{['thm:factorizable_valid']} are not satisfied. Proxy variables and edges related to them are not shown for simplicity.

Theorems & Definitions (10)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Theorem 2.4
  • proof
  • Definition 2.5
  • Theorem 2.6
  • proof
  • Corollary 2.7
  • proof