Causal and Counterfactual Views of Missing Data Models

Razieh Nabi; Rohit Bhattacharya; Ilya Shpitser; James M. Robins

Causal and Counterfactual Views of Missing Data Models

Razieh Nabi, Rohit Bhattacharya, Ilya Shpitser, James M. Robins

TL;DR

It is made explicit how the missing data problem of recovering the complete data law from the observed law can be viewed as identification of a joint distribution over counterfactual variables corresponding to values had the authors (possibly contrary to fact) been able to observe them.

Abstract

It is often said that the fundamental problem of causal inference is a missing data problem -- the comparison of responses to two hypothetical treatment assignments is made difficult because for every experimental unit only one potential response is observed. In this paper, we consider the implications of the converse view: that missing data problems are a form of causal inference. We make explicit how the missing data problem of recovering the complete data law from the observed law can be viewed as identification of a joint distribution over counterfactual variables corresponding to values had we (possibly contrary to fact) been able to observe them. Drawing analogies with causal inference, we show how identification assumptions in missing data can be encoded in terms of graphical models defined over counterfactual and observed variables. We review recent results in missing data identification from this viewpoint. In doing so, we note interesting similarities and differences between missing data and causal identification theories.

Causal and Counterfactual Views of Missing Data Models

TL;DR

Abstract

Paper Structure (19 sections, 2 theorems, 34 equations, 9 figures)

This paper contains 19 sections, 2 theorems, 34 equations, 9 figures.

Introduction
Missing Data Models
Classical Missing Data Models
Counterfactual Views of Classical Missing Data Models
Identification in Missing Data Models
Directed Acyclic Graphs in Causal Inference
Statistical DAG Models
Causal DAG Models
Missing Data DAG Models
Hierarchy of missing data DAG models
Examples of missing data DAG models
Identification in Missing Data DAG Models
Sequential Interventions
Parallel Interventions
Sequential and Parallel Interventions
...and 4 more sections

Key Result

Proposition 1

Under the missingness model of an m-DAG $\mathcal{G}_m$

Figures (9)

Figure 1: (a) A DAG where $U_1$ and $U_2$ may be unmeasured; (b) A conditional DAG illustrating interventions on $R_1$ and $R_2$.
Figure 2: Examples of missing data DAG models.
Figure 3: An illustration of the operation of a sequential identification algorithm. (a) Permutation model; (b) Intervention on $R_2$; (c) Intervention on $R_1$ after $R_2$.
Figure 4: An illustration of the operation of a parallel identification algorithm. (a) Block-parallel; (b) Intervention on $R_2$ and selection on $R_1$; (c) Intervention on $R_1$ and selection on $R_2$.
Figure 5: (a) An example m-DAG corresponding to a model where interventions must be applied both sequentially and in parallel to yield identification; (b) Graph derived from (a) representing the intermediate step of the identification algorithm where $R_2$ and $R_3$ are simultaneously set to $1$.
...and 4 more figures

Theorems & Definitions (2)

Proposition 1
Lemma 1: Invariance property

Causal and Counterfactual Views of Missing Data Models

TL;DR

Abstract

Causal and Counterfactual Views of Missing Data Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (2)