Table of Contents
Fetching ...

Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data

Rickard Karlsson, Piersilvio De Bartolomeis, Issa J. Dahabreh, Jesse H. Krijthe

TL;DR

The paper tackles the challenge of estimating heterogeneous treatment effects in randomized trials when external data from other studies are available but may be misaligned. It introduces the QR-learner, a model-agnostic approach that uses randomization-aware pseudo-outcomes to estimate the CATE within the trial population while leveraging external data to reduce estimation error and enhance power. A complementary combining strategy with a trial-only DR-learner is developed to guarantee that the joint estimator attains a mean squared error that is no worse than its components, with cross-validated tuning ensuring asymptotic optimality. Through simulations and a STAR dataset case study, the authors demonstrate that QR-learner can improve CATE accuracy and statistical power even when external data are imperfectly aligned, highlighting its robustness and practical potential for personalized decision-making in trial populations.

Abstract

Randomized trials are typically designed to detect average treatment effects but often lack the statistical power to uncover individual-level treatment effect heterogeneity, limiting their value for personalized decision-making. To address this, we propose the QR-learner, a model-agnostic learner that estimates conditional average treatment effects (CATE) within the trial population by leveraging external data from other trials or observational studies. The proposed method is robust: it can reduce the mean squared error relative to a trial-only CATE learner, and is guaranteed to recover the true CATE even when the external data are not aligned with the trial. Moreover, we introduce a procedure that combines the QR-learner with a trial-only CATE learner and show that it asymptotically matches or exceeds both component learners in terms of mean squared error. We examine the performance of our approach in simulation studies and apply the methods to a real-world dataset, demonstrating improvements in both CATE estimation and statistical power for detecting heterogeneous effects.

Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data

TL;DR

The paper tackles the challenge of estimating heterogeneous treatment effects in randomized trials when external data from other studies are available but may be misaligned. It introduces the QR-learner, a model-agnostic approach that uses randomization-aware pseudo-outcomes to estimate the CATE within the trial population while leveraging external data to reduce estimation error and enhance power. A complementary combining strategy with a trial-only DR-learner is developed to guarantee that the joint estimator attains a mean squared error that is no worse than its components, with cross-validated tuning ensuring asymptotic optimality. Through simulations and a STAR dataset case study, the authors demonstrate that QR-learner can improve CATE accuracy and statistical power even when external data are imperfectly aligned, highlighting its robustness and practical potential for personalized decision-making in trial populations.

Abstract

Randomized trials are typically designed to detect average treatment effects but often lack the statistical power to uncover individual-level treatment effect heterogeneity, limiting their value for personalized decision-making. To address this, we propose the QR-learner, a model-agnostic learner that estimates conditional average treatment effects (CATE) within the trial population by leveraging external data from other trials or observational studies. The proposed method is robust: it can reduce the mean squared error relative to a trial-only CATE learner, and is guaranteed to recover the true CATE even when the external data are not aligned with the trial. Moreover, we introduce a procedure that combines the QR-learner with a trial-only CATE learner and show that it asymptotically matches or exceeds both component learners in terms of mean squared error. We examine the performance of our approach in simulation studies and apply the methods to a real-world dataset, demonstrating improvements in both CATE estimation and statistical power for detecting heterogeneous effects.

Paper Structure

This paper contains 36 sections, 5 theorems, 44 equations, 4 figures, 2 tables.

Key Result

Theorem 1

Under Conditions asmp:consistency and asmp:strong_ignorability_trial where the propensity score $e(X)$ is known, for any fixed specification of the nuisance models $\eta_{\text{fixed}}$, the minimization problem in eq:pseudo_risk_obj always yields the true CATE as its unique solution provided that $

Figures (4)

  • Figure 1: (a): Evaluating type 1 error (lower better) and power (higher better) in the simulation study with the methods applicable for statistically testing for the presence of an effect modifier as sample size in trial increases, reported over 500 repeated runs. (b) We evaluate the RMSE on the STAR dataset when increasing the trial sample size with a fixed external sample size of $n_0=1000$. We report the average RMSE and standard error over 200 repeated runs.
  • Figure 2: (a): Distributions of the outcome (average test scores) for rural and urban schools in the STAR dataset. We observe a slight shift in the mean between the two groups, suggesting potential violations of transportability, as the primary difference between the trial and the external population lies in school location. (b): t-SNE plot over features colored by study population. We observe some lack of overlap between populations.
  • Figure 3: We evaluate the RMSE on the STAR dataset when increasing the trial sample size with a fixed external sample size of $n_0=1000$. We report the average RMSE and standard error over 200 repeated runs.
  • Figure 4: Separate plot for additive bias correction method of kallus2018removing because its large RMSE values make it difficult to display alongside the other methods. We evaluate the RMSE on the STAR dataset when increasing the trial sample size with a fixed external sample size of $n_0=1000$. We report the average RMSE and standard error over 200 repeated runs.

Theorems & Definitions (12)

  • Theorem 1
  • Remark 1
  • Lemma 1
  • Theorem 2
  • Remark 2
  • Lemma 2
  • Theorem 3
  • proof
  • proof
  • proof
  • ...and 2 more