Table of Contents
Fetching ...

Efficient Targeted Maximum Likelihood Estimators for Two-Phase Design Problems

Sky Qiu, Susan Gruber, Pamela A. Shaw, Brian D. Williamson, Mark J. van der Laan

Abstract

In a typical two-phase design, a random sample is drawn from the target population in phase 1, during which only a subset of variables is collected. In phase 2, a subsample of the phase-1 cohort is selected, and additional variables are measured. This setting induces a coarsened data structure on the data from the second phase. We assume coarsening at random, that is, the phase-2 sampling mechanism depends only on variables fully observed. We review existing estimators, including the generalized raking estimator and the inverse probability of censoring weighted targeted maximum likelihood estimation (IPCW-TMLE) along with its extensions that also target the phase-2 sampling mechanism to improve efficiency. We further introduce a new class of estimators constructed within the TMLE framework that are asymptotically equivalent.

Efficient Targeted Maximum Likelihood Estimators for Two-Phase Design Problems

Abstract

In a typical two-phase design, a random sample is drawn from the target population in phase 1, during which only a subset of variables is collected. In phase 2, a subsample of the phase-1 cohort is selected, and additional variables are measured. This setting induces a coarsened data structure on the data from the second phase. We assume coarsening at random, that is, the phase-2 sampling mechanism depends only on variables fully observed. We review existing estimators, including the generalized raking estimator and the inverse probability of censoring weighted targeted maximum likelihood estimation (IPCW-TMLE) along with its extensions that also target the phase-2 sampling mechanism to improve efficiency. We further introduce a new class of estimators constructed within the TMLE framework that are asymptotically equivalent.
Paper Structure (31 sections, 3 theorems, 88 equations, 1 figure, 8 tables, 3 algorithms)

This paper contains 31 sections, 3 theorems, 88 equations, 1 figure, 8 tables, 3 algorithms.

Key Result

Lemma 5.1

Define The canonical gradient of the parameter $\Psi:\mathcal{M}\rightarrow\mathbb{R}$ at $P$, where is given by where

Figures (1)

  • Figure 1: Wald-type 95% confidence interval oracle coverage of raking, IPCW-TMLE, and IPCW-TMLE with targeting of $\Pi$. Oracle coverage is defined as the proportion of Monte-Carlo runs (out of 1,000) where the 95% confidence interval computed using the empirical standard error covers the true causal estimand.

Theorems & Definitions (10)

  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Lemma 5.1
  • Remark 5
  • Lemma 6.1
  • Lemma 6.2
  • proof
  • proof