Efficient Targeted Maximum Likelihood Estimators for Two-Phase Design Problems

Sky Qiu; Susan Gruber; Pamela A. Shaw; Brian D. Williamson; Mark J. van der Laan

Efficient Targeted Maximum Likelihood Estimators for Two-Phase Design Problems

Sky Qiu, Susan Gruber, Pamela A. Shaw, Brian D. Williamson, Mark J. van der Laan

Abstract

In a typical two-phase design, a random sample is drawn from the target population in phase 1, during which only a subset of variables is collected. In phase 2, a subsample of the phase-1 cohort is selected, and additional variables are measured. This setting induces a coarsened data structure on the data from the second phase. We assume coarsening at random, that is, the phase-2 sampling mechanism depends only on variables fully observed. We review existing estimators, including the generalized raking estimator and the inverse probability of censoring weighted targeted maximum likelihood estimation (IPCW-TMLE) along with its extensions that also target the phase-2 sampling mechanism to improve efficiency. We further introduce a new class of estimators constructed within the TMLE framework that are asymptotically equivalent.

Efficient Targeted Maximum Likelihood Estimators for Two-Phase Design Problems

Abstract

Paper Structure (31 sections, 3 theorems, 88 equations, 1 figure, 8 tables, 3 algorithms)

This paper contains 31 sections, 3 theorems, 88 equations, 1 figure, 8 tables, 3 algorithms.

Introduction
Preliminaries
Notations and terminology
The efficient influence curve
Targeted maximum likelihood estimation
The exact remainder term
Review of existing estimators for two-phase sampling designs
Inverse probability of censoring weighted targeted maximum likelihood estimator (IPCW-TMLE)
IPCW-TMLE with targeted phase-2 sampling mechanism
Generalized raking (GR)
Two estimators that rely on a slight rearrangement of the A-IPCW representation of the EIC
Efficient estimating-equation (EEE) estimator
Quasi-TMLE
A TMLE using an alternative representation of the target parameter
Analyzing the exact remainder terms
...and 16 more sections

Key Result

Lemma 5.1

Define The canonical gradient of the parameter $\Psi:\mathcal{M}\rightarrow\mathbb{R}$ at $P$, where is given by where

Figures (1)

Figure 1: Wald-type 95% confidence interval oracle coverage of raking, IPCW-TMLE, and IPCW-TMLE with targeting of $\Pi$. Oracle coverage is defined as the proportion of Monte-Carlo runs (out of 1,000) where the 95% confidence interval computed using the empirical standard error covers the true causal estimand.

Theorems & Definitions (10)

Remark 1
Remark 2
Remark 3
Remark 4
Lemma 5.1
Remark 5
Lemma 6.1
Lemma 6.2
proof
proof

Efficient Targeted Maximum Likelihood Estimators for Two-Phase Design Problems

Abstract

Efficient Targeted Maximum Likelihood Estimators for Two-Phase Design Problems

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (10)