Powering RCTs for marginal effects with GLMs using prognostic score adjustment
Emilie Højbjerre-Frandsen, Mark J. van der Laan, Alejandro Schuler
TL;DR
This work extends prognostic-score adjustment from linear models to generalized linear models for estimating marginal treatment effects in randomized trials, leveraging historical control data to boost efficiency without bias. It proves local semi-parametric efficiency under an additive-on-link-scale treatment effect, extends to Negative Binomial outcomes, and derives a practical variance and power framework that can use historical data for prospective trial planning. Through simulations and a diabetes case study, the authors show consistent type I error control and improved precision under population shifts, with notable gains when historical data are well-matched. They provide actionable guidance on prognostic-score construction, cross-fitting, SAP preregistration, and case-study interpretation, highlighting GLM prognostic adjustment as a robust, regulatorily-palatable alternative to broader data-fusion approaches.
Abstract
In randomized clinical trials (RCTs), the accurate estimation of marginal treatment effects is crucial for determining the efficacy of interventions. Enhancing the statistical power of these analyses is a key objective for statisticians. The increasing availability of historical data from registries, prior trials, and health records presents an opportunity to improve trial efficiency. However, many methods for historical borrowing compromise strict type-I error rate control. Building on the work by Schuler et al. [2022] on prognostic score adjustment for linear models, this paper extends the methodology to the plug-in analysis proposed by Rosenblum et al. [2010] using generalized linear models (GLMs) to further enhance the efficiency of RCT analyses without introducing bias. Specifically, we train a prognostic model on historical control data and incorporate the resulting prognostic scores as covariates in the plug-in GLM analysis of the trial data. This approach leverages the predictive power of historical data to improve the precision of marginal treatment effect estimates. We demonstrate that this method achieves local semi-parametric efficiency under the assumption of an additive treatment effect on the link scale. We expand the GLM plug-in method to include negative binomial regression. Additionally, we provide a straightforward formula for conservatively estimating the asymptotic variance, facilitating power calculations that reflect these efficiency gains. Our simulation study supports the theory. Even without an additive treatment effect, we observe increased power or reduced standard error. While population shifts from historical to trial data may dilute benefits, they do not introduce bias.
