Table of Contents
Fetching ...

Experiment-selector cross-validated targeted maximum likelihood estimator for hybrid RCT-external data studies

Lauren Eyler Dang, Jens Magelund Tarp, Trine Julie Abrahamsen, Kajsa Kvist, John B Buse, Maya Petersen, Mark van der Laan

TL;DR

The paper introduces ES-CVTMLE, a cross-validated targeted maximum likelihood estimator designed to optimally combine an RCT with external data by selecting the experiment with the best bias-variance tradeoff. It builds a causal framework for hybrid RCT-external data studies, defines the causal gap between pooled and trial-only estimands, and leverages both a bias estimate from the primary outcome and an ATE on a negative control outcome (NCO) to guide data fusion. Estimation uses CV-TMLE with cross-fitting, and confidence intervals are generated via Monte Carlo sampling from the estimated limit distribution, accounting for the data-adaptive experiment selection. Through simulations and a LEADER trial analysis on liraglutide and HbA1c, the method demonstrates improved power when external data are unbiased and robust inference when external data are biased, while allowing discrimination between biased and unbiased external controls via the NCO and bias-variance criteria.

Abstract

Augmenting a randomized controlled trial (RCT) with external data may increase power at the risk of introducing bias. To select and analyze the experiment (RCT alone or combined with external data) with the optimal bias-variance tradeoff, we develop a novel experiment-selector cross-validated targeted maximum likelihood estimator for randomized-external data studies (ES-CVTMLE). This estimator utilizes two estimates of bias to determine whether to integrate external data based on 1) a function of the difference in conditional mean outcome under control between the RCT and combined experiments and 2) an estimate of the average treatment effect on a negative control outcome (NCO). We define the asymptotic distribution of the ES-CVTMLE under varying magnitudes of bias and construct confidence intervals by Monte Carlo simulation. We evaluate ES-CVTMLE compared to three other data fusion estimators in simulations and demonstrate the ability of ES-CVTMLE to distinguish biased from unbiased external controls in a real data analysis of the effect of liraglutide on glycemic control from the LEADER trial. The ES-CVTMLE has the potential to improve power while providing relatively robust inference for future hybrid RCT-external data studies.

Experiment-selector cross-validated targeted maximum likelihood estimator for hybrid RCT-external data studies

TL;DR

The paper introduces ES-CVTMLE, a cross-validated targeted maximum likelihood estimator designed to optimally combine an RCT with external data by selecting the experiment with the best bias-variance tradeoff. It builds a causal framework for hybrid RCT-external data studies, defines the causal gap between pooled and trial-only estimands, and leverages both a bias estimate from the primary outcome and an ATE on a negative control outcome (NCO) to guide data fusion. Estimation uses CV-TMLE with cross-fitting, and confidence intervals are generated via Monte Carlo sampling from the estimated limit distribution, accounting for the data-adaptive experiment selection. Through simulations and a LEADER trial analysis on liraglutide and HbA1c, the method demonstrates improved power when external data are unbiased and robust inference when external data are biased, while allowing discrimination between biased and unbiased external controls via the NCO and bias-variance criteria.

Abstract

Augmenting a randomized controlled trial (RCT) with external data may increase power at the risk of introducing bias. To select and analyze the experiment (RCT alone or combined with external data) with the optimal bias-variance tradeoff, we develop a novel experiment-selector cross-validated targeted maximum likelihood estimator for randomized-external data studies (ES-CVTMLE). This estimator utilizes two estimates of bias to determine whether to integrate external data based on 1) a function of the difference in conditional mean outcome under control between the RCT and combined experiments and 2) an estimate of the average treatment effect on a negative control outcome (NCO). We define the asymptotic distribution of the ES-CVTMLE under varying magnitudes of bias and construct confidence intervals by Monte Carlo simulation. We evaluate ES-CVTMLE compared to three other data fusion estimators in simulations and demonstrate the ability of ES-CVTMLE to distinguish biased from unbiased external controls in a real data analysis of the effect of liraglutide on glycemic control from the LEADER trial. The ES-CVTMLE has the potential to improve power while providing relatively robust inference for future hybrid RCT-external data studies.
Paper Structure (29 sections, 28 equations, 9 figures, 6 tables)

This paper contains 29 sections, 28 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Structural Causal Model
  • Figure 2: Directed Acyclic Graph Including NCO
  • Figure 3: Steps for obtaining a point estimate for the ES-CVTMLE target parameter.
  • Figure 4: Steps for obtaining confidence intervals for the ES-CVTMLE target parameter.
  • Figure 5: Change in $HbA_{1c}$ by Trial Arm and Region Over Time
  • ...and 4 more figures