Table of Contents
Fetching ...

Bounding causal effects with an unknown mixture of informative and non-informative missingness

Max Rubinstein, Denis Agniel, Larry Han, Marcela Horvitz-Lennon, Sharon-Lise Normand

Abstract

In experimental and observational data settings, researchers often have limited knowledge of the reasons for missing outcomes. To address this uncertainty, we propose bounds on causal effects for missing outcomes, accommodating the scenario where missingness is an unobserved mixture of informative and non-informative components. Within this mixed missingness framework, we explore several assumptions to derive bounds on causal effects, including bounds expressed as a function of user-specified sensitivity parameters. We develop influence-function based estimators of these bounds to enable flexible, non-parametric, and machine learning based estimation, achieving root-n convergence rates and asymptotic normality under relatively mild conditions. We further consider the identification and estimation of bounds for other causal quantities that remain meaningful when informative missingness reflects a competing outcome, such as death. We conduct simulation studies and illustrate our methodology with a study on the causal effect of antipsychotic drugs on diabetes risk using a health insurance dataset.

Bounding causal effects with an unknown mixture of informative and non-informative missingness

Abstract

In experimental and observational data settings, researchers often have limited knowledge of the reasons for missing outcomes. To address this uncertainty, we propose bounds on causal effects for missing outcomes, accommodating the scenario where missingness is an unobserved mixture of informative and non-informative components. Within this mixed missingness framework, we explore several assumptions to derive bounds on causal effects, including bounds expressed as a function of user-specified sensitivity parameters. We develop influence-function based estimators of these bounds to enable flexible, non-parametric, and machine learning based estimation, achieving root-n convergence rates and asymptotic normality under relatively mild conditions. We further consider the identification and estimation of bounds for other causal quantities that remain meaningful when informative missingness reflects a competing outcome, such as death. We conduct simulation studies and illustrate our methodology with a study on the causal effect of antipsychotic drugs on diabetes risk using a health insurance dataset.

Paper Structure

This paper contains 34 sections, 23 theorems, 132 equations, 2 figures, 3 tables.

Key Result

Proposition 1

Under Assumptions asmpt:non-informative-asmpt:cens-positivity,

Figures (2)

  • Figure 1: Sensitivity analysis parameters corresponding to $\Psi_0 = 0$ under the assumptions of Proposition \ref{['prop:4']}. Each line corresponds to a different value of $\tau$, and each point on a line indicates a set of sensitivity parameters ($\tau, \delta_0, \delta_1$) corresponding to $\Psi_0 = 0$. Any values of $(\delta_0, \delta_1)$ to the lower right of the $\tau-$line are sensitivity parameters that can explain the observed association ${\widetilde{\Psi}}$.
  • Figure 2: Nuisance components used in simulation including missingness ($\pi_a(x)$), outcome ($\mu_a(x), \mu^*_a(x)$), and propensity score ($e_1(x)$) models.

Theorems & Definitions (65)

  • Definition 1: Average Treatment Effect (ATE)
  • Proposition 1: Expression for $\Psi_0$
  • Proposition 2: General bounds on ATE
  • Corollary 1: General bounds on ATE, bounded or known proportion informative missingness
  • Proposition 3: Bounds on ATE under monotonicity
  • Remark 1
  • Proposition 4: Bounds on ATE, bounded outcome risk
  • Remark 2
  • Proposition 5: Point identification of the ATE
  • Remark 3
  • ...and 55 more