Improving RCT-Based CATE Estimation Under Covariate Mismatch via Double Calibration

Samhita Pal; Jared D. Huling; Amir Asiaee

Improving RCT-Based CATE Estimation Under Covariate Mismatch via Double Calibration

Samhita Pal, Jared D. Huling, Amir Asiaee

Abstract

We develop estimators that improve precision of heterogeneous treatment effect estimates that allow borrowing information from observational studies when the available covariates in each data source do not perfectly match. Standard data-borrowing methods often assume perfectly matched covariates. We propose MR-OSCAR, an RCT-calibrated, two-stage estimation approach that first predicts the trial-missing variables using the observational data via imputation and then calibrates observational outcome predictions to the randomized trial, preserving the causal contrast, unlike the results for generalization, where imputation does not improve performance. Our theory gives finite-sample guarantees with a transparent error decomposition including an imputation error that shrinks as the observational mapping becomes more predictable. Simulations show that imputation almost always outperforms naively using only the shared covariates and clarifies when borrowing helps (strong predictability of the missing block, moderate trial size) and when it does not (poor predictability or dominant trial-only moderators). We motivate the approach with the Greenlight Plus trial on early childhood obesity and outline a forthcoming EHR analysis at Vanderbilt, highlighting the use of our method in common scenarios where data do not perfectly align.

Improving RCT-Based CATE Estimation Under Covariate Mismatch via Double Calibration

Abstract

Paper Structure (28 sections, 7 theorems, 80 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 7 theorems, 80 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Methods for data borrowing from an RCT under covariate mismatch
Setup, background, and causal assumptions
Baseline Methods under Covariate Mismatch
Proposed Method: MR-OSCAR
Example: Sparse Linear Model
Theory
Function classes, complexity measures, and shift structure
Baseline risk bound for RACER
Error bounds for MR-OSCAR and imputation penalties
Augmentation-error decomposition for MR-OSCAR.
Specialization of risk bounds to sparse linear models
Finite sample experiments
Varying the Effect of OS-only Covariates
Varying the RCT Sample Size
...and 13 more sections

Key Result

Theorem 1

Suppose Assumptions as:ident and as:rates hold, and that RACER uses arm-specific outcome classes $\mathcal{M}^{r}_{a}$ and final CATE class $\mathcal{D}$, with nuisance estimators obtained via cross-fitting. Then there exists a constant $C>0$ such that, with probability at least $1-\gamma$ for all $ Equivalently, in the localized-complexity notation of asiaee2023leveraging, if $c(\mathcal{M}^{r}_{

Figures (5)

Figure 1: Left: Mean RMSE (points) with $\pm 1$ SD (bars) versus the effect size of the OS-only covariates $\bm{V}$ (OS size fixed at $n^o=1000$, RCT size at $n^r=300$). Right: Mean RMSE versus RCT sample size $n^r$ (OS fixed at $n^o=1000$).
Figure 2: Heatmaps of RMSE gaps over the grid of covariate-mismatch fractions $(f_1,f_2)$, where $f_1$ is the share exclusive to the RCT ($\bm{U}$) and $f_2$ the share exclusive to the OS ($\bm{V}$).
Figure 3: Upper Panel: Sorted individual CATE estimates with 95% post-LASSO OLS confidence intervals; grey intervals cover 0, while red intervals correspond to subjects whose CIs exclude 0. Lower Panel: Covariate-specific effects on the CATE with 95% confidence intervals for the top 20 covariates (by maximum absolute effect size).
Figure 4: RMSE (mean $\pm$ 1 SD) for CATE estimation in the RCT population as a function of $R^2(\bm{V}\mid \bm{Z})$, which controls how well the OS-only covariates $\bm{V}$ can be predicted from $\bm{Z}$. MR-OSCAR (red) is slightly worse than RACER (green) when imputability is low, ties around moderate values, and dominates once $R^2(\bm{V}\mid \bm{Z})$ is sufficiently large; SR-OSCAR (blue) is uniformly worst.
Figure 5: Loveplot of Covariate Balance

Theorems & Definitions (11)

Theorem 1: Baseline risk bound for RACER, adapted from asiaee2023leveraging
Theorem 2: Risk bound for MR-OSCAR
Proposition 1: Augmentation error for MR-OSCAR
Corollary 2.1: MR-OSCAR can dominate SR-OSCAR
Theorem 3: MR-OSCAR risk bound in the sparse linear setting
Theorem 4: Risk bound for SR-OSCAR
Proposition 2: Augmentation error for shared-only borrowing
proof : Proof of \ref{['prop:mroscar-aug']}
proof : Proof of \ref{['prop:sroscar-aug']}
proof : Proof of Theorem \ref{['thm:mroscar']}
...and 1 more

Improving RCT-Based CATE Estimation Under Covariate Mismatch via Double Calibration

Abstract

Improving RCT-Based CATE Estimation Under Covariate Mismatch via Double Calibration

Authors

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (11)