Table of Contents
Fetching ...

Correcting for Missing Data When Evaluating Surrogate Markers in a Clinical Trial

Sarah C. Lotspeich, P. D. Anh. Nguyen, Layla Parast

Abstract

Evaluating treatment effects is critical in clinical trials but sometimes involves lengthy, invasive, or costly follow-up procedures. In these cases, surrogate markers, which provide intermediate measures of the long-term treatment effect, allow clinicians to obtain results faster and more efficiently than would have otherwise been possible. Prior to adoption, it is vital that the utility of surrogate markers (i.e., their ability to capture the treatment effect on the primary outcome) is statistically validated. Many frameworks for evaluating surrogate markers have been proposed, but they do not account for missing data. Instead, they rely on complete cases (the subset of patients without missing data), which can be inefficient and biased. To improve on this, we propose methods to accommodate missing data in nonparametric and parametric surrogate evaluation via inverse probability weighting (IPW) and semiparametric maximum likelihood estimation (SMLE). Through simulation studies, we demonstrate that the proposed methods remain unbiased under a broader range of missing data mechanisms than complete case analysis and can help retain the statistical precision of the full trial. We illustrate their practical utility through an application to a diabetes clinical trial. Moreover, our missing data corrections have complementary strengths with respect to computational ease, robustness, and statistical efficiency. All methods are implemented in the MissSurrogate R package.

Correcting for Missing Data When Evaluating Surrogate Markers in a Clinical Trial

Abstract

Evaluating treatment effects is critical in clinical trials but sometimes involves lengthy, invasive, or costly follow-up procedures. In these cases, surrogate markers, which provide intermediate measures of the long-term treatment effect, allow clinicians to obtain results faster and more efficiently than would have otherwise been possible. Prior to adoption, it is vital that the utility of surrogate markers (i.e., their ability to capture the treatment effect on the primary outcome) is statistically validated. Many frameworks for evaluating surrogate markers have been proposed, but they do not account for missing data. Instead, they rely on complete cases (the subset of patients without missing data), which can be inefficient and biased. To improve on this, we propose methods to accommodate missing data in nonparametric and parametric surrogate evaluation via inverse probability weighting (IPW) and semiparametric maximum likelihood estimation (SMLE). Through simulation studies, we demonstrate that the proposed methods remain unbiased under a broader range of missing data mechanisms than complete case analysis and can help retain the statistical precision of the full trial. We illustrate their practical utility through an application to a diabetes clinical trial. Moreover, our missing data corrections have complementary strengths with respect to computational ease, robustness, and statistical efficiency. All methods are implemented in the MissSurrogate R package.
Paper Structure (22 sections, 29 equations, 3 figures, 4 tables)

This paper contains 22 sections, 29 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: The nonparametric estimator $\widehat{R}_S$, of the proportion of treatment effect explained (PTE), assumes that the distributions of the surrogate marker $S$ in the treatment and control groups overlap (left panel). In Setting $5$, we consider the setting where these distributions do not fully overlap (right panel).
  • Figure 2: Simulation results for $\widehat{R}_S$, the proportion of treatment effect explained (PTE) in Setting $4$, when the surrogate marker $S$ was missing at random given $Y$ and the interaction $Y \times Z$ between it and treatment group (top row) or just the primary outcome $Y$ (bottom row).
  • Figure 3: Distributions of the surrogate (change in glucose; top left panel) and the primary outcome (change in LDL cholesterol; top right panel) by treatment group, and the surrogate versus primary outcome (bottom panel) within each treatment group in the Diabetes Control and Complications Trial. The solid lines in the bottom panel are loess smoothers, and the shaded area represents the $95\%$ confidence intervals around them.