Table of Contents
Fetching ...

Powering RCTs for marginal effects with GLMs using prognostic score adjustment

Emilie Højbjerre-Frandsen, Mark J. van der Laan, Alejandro Schuler

TL;DR

This work extends prognostic-score adjustment from linear models to generalized linear models for estimating marginal treatment effects in randomized trials, leveraging historical control data to boost efficiency without bias. It proves local semi-parametric efficiency under an additive-on-link-scale treatment effect, extends to Negative Binomial outcomes, and derives a practical variance and power framework that can use historical data for prospective trial planning. Through simulations and a diabetes case study, the authors show consistent type I error control and improved precision under population shifts, with notable gains when historical data are well-matched. They provide actionable guidance on prognostic-score construction, cross-fitting, SAP preregistration, and case-study interpretation, highlighting GLM prognostic adjustment as a robust, regulatorily-palatable alternative to broader data-fusion approaches.

Abstract

In randomized clinical trials (RCTs), the accurate estimation of marginal treatment effects is crucial for determining the efficacy of interventions. Enhancing the statistical power of these analyses is a key objective for statisticians. The increasing availability of historical data from registries, prior trials, and health records presents an opportunity to improve trial efficiency. However, many methods for historical borrowing compromise strict type-I error rate control. Building on the work by Schuler et al. [2022] on prognostic score adjustment for linear models, this paper extends the methodology to the plug-in analysis proposed by Rosenblum et al. [2010] using generalized linear models (GLMs) to further enhance the efficiency of RCT analyses without introducing bias. Specifically, we train a prognostic model on historical control data and incorporate the resulting prognostic scores as covariates in the plug-in GLM analysis of the trial data. This approach leverages the predictive power of historical data to improve the precision of marginal treatment effect estimates. We demonstrate that this method achieves local semi-parametric efficiency under the assumption of an additive treatment effect on the link scale. We expand the GLM plug-in method to include negative binomial regression. Additionally, we provide a straightforward formula for conservatively estimating the asymptotic variance, facilitating power calculations that reflect these efficiency gains. Our simulation study supports the theory. Even without an additive treatment effect, we observe increased power or reduced standard error. While population shifts from historical to trial data may dilute benefits, they do not introduce bias.

Powering RCTs for marginal effects with GLMs using prognostic score adjustment

TL;DR

This work extends prognostic-score adjustment from linear models to generalized linear models for estimating marginal treatment effects in randomized trials, leveraging historical control data to boost efficiency without bias. It proves local semi-parametric efficiency under an additive-on-link-scale treatment effect, extends to Negative Binomial outcomes, and derives a practical variance and power framework that can use historical data for prospective trial planning. Through simulations and a diabetes case study, the authors show consistent type I error control and improved precision under population shifts, with notable gains when historical data are well-matched. They provide actionable guidance on prognostic-score construction, cross-fitting, SAP preregistration, and case-study interpretation, highlighting GLM prognostic adjustment as a robust, regulatorily-palatable alternative to broader data-fusion approaches.

Abstract

In randomized clinical trials (RCTs), the accurate estimation of marginal treatment effects is crucial for determining the efficacy of interventions. Enhancing the statistical power of these analyses is a key objective for statisticians. The increasing availability of historical data from registries, prior trials, and health records presents an opportunity to improve trial efficiency. However, many methods for historical borrowing compromise strict type-I error rate control. Building on the work by Schuler et al. [2022] on prognostic score adjustment for linear models, this paper extends the methodology to the plug-in analysis proposed by Rosenblum et al. [2010] using generalized linear models (GLMs) to further enhance the efficiency of RCT analyses without introducing bias. Specifically, we train a prognostic model on historical control data and incorporate the resulting prognostic scores as covariates in the plug-in GLM analysis of the trial data. This approach leverages the predictive power of historical data to improve the precision of marginal treatment effect estimates. We demonstrate that this method achieves local semi-parametric efficiency under the assumption of an additive treatment effect on the link scale. We expand the GLM plug-in method to include negative binomial regression. Additionally, we provide a straightforward formula for conservatively estimating the asymptotic variance, facilitating power calculations that reflect these efficiency gains. Our simulation study supports the theory. Even without an additive treatment effect, we observe increased power or reduced standard error. While population shifts from historical to trial data may dilute benefits, they do not introduce bias.

Paper Structure

This paper contains 34 sections, 10 theorems, 59 equations, 9 figures, 5 tables.

Key Result

Theorem 1

Let $\mathbb{P}_n$ and $\mathbb{P}_{\widetilde{n}}$ be the empirical distributions of $n$ and $\tilde{n}$ draws from $P(W,A,Y|D=1)$ and $P(W,Y|D=0)$, respectively. Presume that the number of participants $n$ in the current trial increases such that $n=\mathcal{O}\left(\widetilde{n}\right)$. Furtherm

Figures (9)

  • Figure 1: Estimated efficiencies of each estimator relative to standard covariate adjustment for $n = 250$. The box-and-whisker plot shows the distribution of the empirical efficiencies across replicates. Only the prognostically-adjusted estimators are shown for the shift scenarios since the other estimators do not make any use of the historical data and therefore do not change.
  • Figure 2: Empirical coverage of each estimator across scenarios as trial sample size varies.
  • Figure 3: Empirical percentage of significant results (power, or type I error for the null scenario) of each estimator with increasing trial sample size across scenarios with no shift and unobserved covariate shifts. Vertical dashed lines indicate the average sample size estimated to attain 80% power by each method. Equivalent plots for the observed covariate shifts show the same pattern so we omit them to reduce visual clutter. The estimated sample sizes when adjusting for only the prognostic score or for the noise prognostic score plus covariates are the same as that for prognostic adjustment with covariates.
  • Figure 4: Conceptual illustration of various influence functions. All influence functions (up to an offset of $\phi$) are orthogonal to the tangent space of $\mathcal{M}$ at $\mathcal{P}$.
  • Figure 5: Conceptual illustration of the limiting regressions $\overline\mu$ and $\widetilde{\mu}$ as projections of $\mu$ onto their respective nested models.
  • ...and 4 more figures

Theorems & Definitions (18)

  • Theorem 1
  • Theorem 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • Theorem 6
  • proof
  • ...and 8 more