Surrogate-Assisted Targeted Learning for Delayed Outcomes under Administrative Censoring

Lin Li

Surrogate-Assisted Targeted Learning for Delayed Outcomes under Administrative Censoring

Lin Li

Abstract

Delayed primary outcomes and administratively censored follow-up create a general semiparametric estimation problem: the target causal functional depends on an endpoint observed only for a shrinking subset of units at analysis time, while earlier surrogate measurements remain widely available. In such settings, inverse-probabilityweighted estimators can become unstable as observation probabilities approach the positivity boundary, and complete-case model-based analyses can be highly sensitive to outcome-model specification. We develop a surrogate-assisted targeted minimum loss estimator for this nested causal functional. Identification proceeds through a surrogate-bridge representation that integrates an observed-outcome regression over the conditional surrogate distribution, thereby avoiding inverse observation weights in the target parameter itself. We show that the estimator is asymptotically linear and doubly robust (in the sense that first-order bias vanishes when either nuisance component is consistently estimated), and we characterize two structural features of the problem: under surrogate-mediated missing at random, the censoring mechanism contributes no separate tangent-space component to the efficient influence function; and for nested bridge functionals, a one-step debiased machine-learning construction leaves a second-order cross-product remainder involving the conditional surrogate law. The proposed two-stage targeting step removes this term without requiring direct estimation of that law. Simulation studies demonstrate stable finite-sample performance under substantial administrative censoring, and a design-calibrated analysis based on the Washington State EPT study illustrates the method in a realistic stepped-wedge cluster-randomized setting.

Surrogate-Assisted Targeted Learning for Delayed Outcomes under Administrative Censoring

Abstract

Paper Structure (47 sections, 6 theorems, 45 equations, 3 figures, 6 tables)

This paper contains 47 sections, 6 theorems, 45 equations, 3 figures, 6 tables.

Introduction
Limitations of standard approaches.
Contributions.
Related statistical work.
Data Structure and Identification
Potential Outcomes and the Causal Target
Non-Parametric Identification
Semiparametric Theory
EIC Decomposition
The Nested Cross-Product Remainder and the Necessity of Two-Stage Targeting
Surrogate-Assisted TMLE Construction
Stage 1: Initial Estimation via Super Learner
Stage 2: The Nested Fluctuation Step
Clever covariate for the surrogate integration model.
Convergence criterion.
...and 32 more sections

Key Result

Theorem 1

Under Assumptions ass:consistency--ass:pos_trt, the causal ATE is identified by the longitudinal G-computation formula:

Figures (3)

Figure 1: Directed Acyclic Graph for the SW-CRT Structural Causal Model. Red circles: endogenous variables. Blue circles: exogenous and design variables. Violet arrows: cluster random effect $b_j$, the source of ICC. The critical structural feature is the absence of a directed edge $Y_{ijt}\!\to\!\Delta_{ijt}$, encoding Assumption \ref{['ass:mar']}: once $S_{ijt}$ is observed, the censoring probability depends on $S$ but not directly on the unobserved $Y$. The secular time trend $t$ acts as a common cause of all endogenous variables and must be adjusted for in estimation.
Figure 2: Block III: bias (A), coverage (B), and power (C) under increasing censoring ($J=30$, $t_{\mathrm{lag}}=3$, 1,000 replicates). CV-TMLE bias stays near zero while GLMM/IPCW bias grows to 0.32. High GLMM/IPCW rejection rates at heavy censoring reflect bias, not signal (near-zero coverage).
Figure 3: Oracle comparison for the design-calibrated EPT illustration (calibrated to golden2015uptake). Horizontal bars are 95% cluster-robust confidence intervals; the vertical dashed line marks zero; the dotted line marks the known oracle $\Psi_0 = -0.0039$. All three estimators cover the oracle, with point estimates within 0.003 of the truth. The key comparison is CI width: IPCW (width 0.068) is twice as wide as SA-TMLE (0.034), reflecting variance inflation from near-zero wave-4 censoring weights. GLMM achieves the narrowest CI (0.026) at the cost of model dependence. This figure illustrates oracle coverage under a calibrated design, not the causal effect of EPT.

Theorems & Definitions (18)

Remark 1: Intra-cluster correlation
Remark 2: Practical plausibility of Assumption \ref{['ass:mar']}
Remark 3: Support positivity versus inverse-weighting regularity
Remark 4: Why Treatment Positivity Holds Marginally Despite Pointwise Violations
Theorem 1: Identification via a Surrogate Bridge under Support Positivity
proof
Lemma 1: Vanishing $\mathcal{T}_\Delta$ Component
proof
Lemma 2: Cluster-Level Summation Rule
proof
...and 8 more

Surrogate-Assisted Targeted Learning for Delayed Outcomes under Administrative Censoring

Abstract

Surrogate-Assisted Targeted Learning for Delayed Outcomes under Administrative Censoring

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (18)