Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials

Laura B. Balzer; Erica Cai; Lucas Godoy Garraza; Pracheta Amaranath

Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials

Laura B. Balzer, Erica Cai, Lucas Godoy Garraza, Pracheta Amaranath

TL;DR

The approach maintains Type-I error control and offers substantial gains in precision, equivalent to 20%-43% reductions in sample size for the same statistical power when applied to real data from ACTG Study 175, and also sees meaningful efficiency improvements overall and within subgroups.

Abstract

Benkeser et al. demonstrate how adjustment for baseline covariates in randomized trials can meaningfully improve precision for a variety of outcome types. Their findings build on a long history, starting in 1932 with R.A. Fisher and including more recent endorsements by the U.S. Food and Drug Administration and the European Medicines Agency. Here, we address an important practical consideration: *how* to select the adjustment approach -- which variables and in which form -- to maximize precision, while maintaining Type-I error control. Balzer et al. previously proposed *Adaptive Prespecification* within TMLE to flexibly and automatically select, from a prespecified set, the approach that maximizes empirical efficiency in small trials (N$<$40). To avoid overfitting with few randomized units, selection was previously limited to working generalized linear models, adjusting for a single covariate. Now, we tailor Adaptive Prespecification to trials with many randomized units. Using $V$-fold cross-validation and the estimated influence curve-squared as the loss function, we select from an expanded set of candidates, including modern machine learning methods adjusting for multiple covariates. As assessed in simulations exploring a variety of data generating processes, our approach maintains Type-I error control (under the null) and offers substantial gains in precision -- equivalent to 20-43\% reductions in sample size for the same statistical power. When applied to real data from ACTG Study 175, we also see meaningful efficiency improvements overall and within subgroups.

Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials

TL;DR

Abstract

40). To avoid overfitting with few randomized units, selection was previously limited to working generalized linear models, adjusting for a single covariate. Now, we tailor Adaptive Prespecification to trials with many randomized units. Using

-fold cross-validation and the estimated influence curve-squared as the loss function, we select from an expanded set of candidates, including modern machine learning methods adjusting for multiple covariates. As assessed in simulations exploring a variety of data generating processes, our approach maintains Type-I error control (under the null) and offers substantial gains in precision -- equivalent to 20-43\% reductions in sample size for the same statistical power. When applied to real data from ACTG Study 175, we also see meaningful efficiency improvements overall and within subgroups.

Paper Structure (6 sections, 7 equations, 2 figures, 10 tables)

This paper contains 6 sections, 7 equations, 2 figures, 10 tables.

Introduction
Methods
Adaptive Prespecification (APS)
Simulation Studies
Real data application: ACTG Study 175
Discussion

Figures (2)

Figure 1: Schematic of Adaptive Prespecification (APS) within TMLE to flexibly and automatically select, from a prespecified set, the adjustment approach that maximizes empirical efficiency for the effect of interest. For illustration, we show $R$ candidate outcome regression estimators $\mathbb{E}(Y|A,W)$, $P$ candidate propensity score estimators $\mathbb{P}(A=1|W)$, and $V=5$ fold cross-validation (CV). For simplicity, we show the process for first and last folds, and use ellipses to indicate an analogous process for the other folds. Let $K_v$ denote the set of indices for the observations in fold $v$ of size $|K_v|=n_v$. For observation $k$ in validation set $v$, the CV-influence curve estimate for the TMLE using candidate outcome regression $r$ but no targeting is denoted $\hat{D}_r^{-v}(O_k)$ in Step 4, while the corresponding CV-estimate of the influence curve for the TMLE using the selected outcome regression $\star$ and targeting with candidate propensity score estimator $p$ is denoted $\hat{D}_{\star p}^{*,-v}(O_k)$ (Appendix A).
Figure 2: Across 5000 simulated trials with a binary outcome (top) and with a continuous outcome (bottom), the estimated savings in sample size (in %), as compared to the unadjusted estimator, when using forced adjustment for $W_1$ in the outcome regression ("Static"), TMLE with the small-trial implementation of Adaptive Prespecification ("Small APS"), and TMLE with the large-trial implementation ("Large APS") across the 3 data generating processes with prognostic covariates and with simple versus stratified randomization.

Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials

TL;DR

Abstract

Adaptive Selection of the Optimal Strategy to Improve Precision and Power in Randomized Trials

Authors

TL;DR

Abstract

Table of Contents

Figures (2)