Table of Contents
Fetching ...

Weighting-Based Identification and Estimation in Graphical Models of Missing Data

Anna Guo, Razieh Nabi

TL;DR

We address MNAR missing data by modeling the missingness mechanism with a conditional DAG and adopting an interventionist view of missingness indicators. A tree-based identification algorithm tracks selection bias across intervention sequences on $R$, yielding explicit propensity-score representations and post-intervention kernels for constructing estimators. Building on this, the authors develop recursive inverse probability weighting procedures with estimating equations that mirror the intervention logic, and demonstrate their effectiveness through simulations and a real-data application. An accompanying R package, flexMissing, implements the procedures for propensity-score estimation, weighting, and inference on functionals of the target law.

Abstract

We propose a constructive algorithm for identifying complete data distributions in graphical models of missing data. The complete data distribution is unrestricted, while the missingness mechanism is assumed to factorize according to a conditional directed acyclic graph. Our approach follows an interventionist perspective in which missingness indicators are treated as variables that can be intervened on. A central challenge in this setting is that sequences of interventions on missingness indicators may induce and propagate selection bias, so that identification can fail even when a propensity score is invariant to available interventions. To address this challenge, we introduce a tree-based identification algorithm that explicitly tracks the creation and propagation of selection bias and determines whether it can be avoided through admissible intervention strategies. The resulting tree provides both a diagnostic and a constructive characterization of identifiability under a given missingness mechanism. Building on these results, we develop recursive inverse probability weighting procedures that mirror the intervention logic of the identification algorithm, yielding valid estimating equations for both the missingness mechanism and functionals of the complete data distribution. Simulation studies and a real-data application illustrate the practical performance of the proposed methods. An accompanying R package, flexMissing, implements all proposed procedures.

Weighting-Based Identification and Estimation in Graphical Models of Missing Data

TL;DR

We address MNAR missing data by modeling the missingness mechanism with a conditional DAG and adopting an interventionist view of missingness indicators. A tree-based identification algorithm tracks selection bias across intervention sequences on , yielding explicit propensity-score representations and post-intervention kernels for constructing estimators. Building on this, the authors develop recursive inverse probability weighting procedures with estimating equations that mirror the intervention logic, and demonstrate their effectiveness through simulations and a real-data application. An accompanying R package, flexMissing, implements the procedures for propensity-score estimation, weighting, and inference on functionals of the target law.

Abstract

We propose a constructive algorithm for identifying complete data distributions in graphical models of missing data. The complete data distribution is unrestricted, while the missingness mechanism is assumed to factorize according to a conditional directed acyclic graph. Our approach follows an interventionist perspective in which missingness indicators are treated as variables that can be intervened on. A central challenge in this setting is that sequences of interventions on missingness indicators may induce and propagate selection bias, so that identification can fail even when a propensity score is invariant to available interventions. To address this challenge, we introduce a tree-based identification algorithm that explicitly tracks the creation and propagation of selection bias and determines whether it can be avoided through admissible intervention strategies. The resulting tree provides both a diagnostic and a constructive characterization of identifiability under a given missingness mechanism. Building on these results, we develop recursive inverse probability weighting procedures that mirror the intervention logic of the identification algorithm, yielding valid estimating equations for both the missingness mechanism and functionals of the complete data distribution. Simulation studies and a real-data application illustrate the practical performance of the proposed methods. An accompanying R package, flexMissing, implements all proposed procedures.
Paper Structure (36 sections, 4 theorems, 60 equations, 9 figures, 10 tables, 3 algorithms)

This paper contains 36 sections, 4 theorems, 60 equations, 9 figures, 10 tables, 3 algorithms.

Key Result

Theorem 1

Assume identification criterion eq:id_criteria holds for $R_k$ using the tree $\mathbb T_k$ in the post-intervention law $p_{\mathbb T_k}=\phi^p_{\sigma_k}\{p\}$. Then $\pi_k(\mathop{\mathrm{pa}}\nolimits_{\mathcal{G}}(R_k))$, evaluated at $\mathcal{S}_k^r=1$, is identified from the observed data la

Figures (9)

  • Figure 1: mDAGs illustrating selection behavior under interventions: (a) (in)admissible interventions; (b) non-propagating selection; (c) propagating selection; (d) identification may require interventions on descendants outside the causal path between $R_k$ and $R_j \in \mathcal{R}_k^p$.
  • Figure 2: (a, b, c) Examples used to illustrate the identification Algorithm \ref{['alg:ID']}; (d, e, f) The corresponding constructed trees.
  • Figure 3: Tree-based Identification Algorithm$({\mathcal{G}}(X, R, X^*))$
  • Figure 4: Simulation results for estimation of a mean using four missing data methods: Amelia, complete-case analysis, MICE, and the proposed tree-based method. Panels correspond to data generated under mDAGs $\mathcal{G}_1$ through $\mathcal{G}_4$.
  • Figure 5: Simulation results for estimating an average causal effect. Missing data are handled using four methods: Amelia, complete-case analysis, MICE, and the proposed tree-based method. Panels corresponds to data generated under mDAGs $\mathcal{G}_1$ through $\mathcal{G}_4$.
  • ...and 4 more figures

Theorems & Definitions (14)

  • Example 1
  • Example 2
  • Example 3
  • Example 4
  • Example 5
  • Example 6
  • Theorem 1: Identification functional induced by $\mathbb T_k$
  • Corollary 2: Observed-data representation
  • Example 7
  • Example 8
  • ...and 4 more