Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data
Rickard Karlsson, Piersilvio De Bartolomeis, Issa J. Dahabreh, Jesse H. Krijthe
TL;DR
The paper tackles the challenge of estimating heterogeneous treatment effects in randomized trials when external data from other studies are available but may be misaligned. It introduces the QR-learner, a model-agnostic approach that uses randomization-aware pseudo-outcomes to estimate the CATE within the trial population while leveraging external data to reduce estimation error and enhance power. A complementary combining strategy with a trial-only DR-learner is developed to guarantee that the joint estimator attains a mean squared error that is no worse than its components, with cross-validated tuning ensuring asymptotic optimality. Through simulations and a STAR dataset case study, the authors demonstrate that QR-learner can improve CATE accuracy and statistical power even when external data are imperfectly aligned, highlighting its robustness and practical potential for personalized decision-making in trial populations.
Abstract
Randomized trials are typically designed to detect average treatment effects but often lack the statistical power to uncover individual-level treatment effect heterogeneity, limiting their value for personalized decision-making. To address this, we propose the QR-learner, a model-agnostic learner that estimates conditional average treatment effects (CATE) within the trial population by leveraging external data from other trials or observational studies. The proposed method is robust: it can reduce the mean squared error relative to a trial-only CATE learner, and is guaranteed to recover the true CATE even when the external data are not aligned with the trial. Moreover, we introduce a procedure that combines the QR-learner with a trial-only CATE learner and show that it asymptotically matches or exceeds both component learners in terms of mean squared error. We examine the performance of our approach in simulation studies and apply the methods to a real-world dataset, demonstrating improvements in both CATE estimation and statistical power for detecting heterogeneous effects.
