Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Calibrated Alignment

Amir Asiaee; Samhita Pal

Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Calibrated Alignment

Amir Asiaee, Samhita Pal

Abstract

Randomized controlled trials (RCTs) are the gold standard for estimating heterogeneous treatment effects, yet they are often underpowered for detecting effect heterogeneity. Large observational studies (OS) can supplement RCTs for conditional average treatment effect (CATE) estimation, but a key barrier is covariate mismatch: the two sources measure different, only partially overlapping, covariates. We propose CALM (Calibrated ALignment under covariate Mismatch), which bypasses imputation by learning embeddings that map each source's features into a common representation space. OS outcome models are transferred to the RCT embedding space and calibrated using trial data, preserving causal identification from randomization. Finite-sample risk bounds decompose into alignment error, outcome-model complexity, and calibration complexity terms, identifying when embedding alignment outperforms imputation. Under the calibration-based linear variant, the framework provides protection against negative transfer; the neural variant can be vulnerable under severe distributional shift. Under sparse linear models, the embedding approach strictly generalizes imputation. Simulations across 51 settings confirm that (i) calibration-based methods are equivalent for linear CATEs, and (ii) the neural embedding variant wins all 22 nonlinear-regime settings with large margins.

Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Calibrated Alignment

Abstract

Paper Structure (53 sections, 5 theorems, 31 equations, 11 figures, 1 table, 1 algorithm)

This paper contains 53 sections, 5 theorems, 31 equations, 11 figures, 1 table, 1 algorithm.

Introduction
The covariate mismatch barrier.
Imputation is a harder problem than needed.
Our contribution: embedding alignment.
Related Work
Problem Setup and Background
Notation and Data Structure
The CMO-Augmented Pseudo-Outcome Framework
The R-Oscar Pipeline and Imputation-Based Baselines
R-Oscar without covariate mismatch.
Calm: Embedding-Aligned CATE Estimation
Core Idea: From Imputation to Alignment
Assumptions
The Calm Algorithm
Stage 1: OS outcome model in embedding space.
...and 38 more sections

Key Result

Theorem 1

Suppose Assumptions as:rct--as:alignment hold, and all nuisance estimators are obtained via penalized empirical risk minimizer with sample splitting or cross-fitting across stages. Let $\mathcal{D}$ be the function class for the CATE correction in Stage 4 (Algorithm alg:calm), and let $\mathcal{F} =

Figures (11)

Figure 1: How OS data inform pseudo-outcome construction in Calm. OS and RCT data pass through source-specific encoders $\phi^o$ (frozen after Stage 1) and $\phi^r$ (trainable at Stage 2) into a shared embedding space $\bm{H}\in\mathbb{R}^d$. The calibrated outcome model $\hat{\mu}^r_a = \hat{\mu}^o_a + \delta^t_a$ combines the frozen OS outcome head with a learnable shift. Pseudo-outcomes $\hat{\psi}^r_a$ are then constructed from these calibrated predictions together with the RCT covariates and outcomes. The calibrated model $\hat{\mu}^r_a$ is also used in the subsequent CATE calibration stage (Stage 4 of Algorithm \ref{['alg:calm']}).
Figure 2: RMSE of CATE estimation across three experimental sweeps. Mean over 20 replicates. In all panels, the blue band groups four calibration-based methods (Racer, SR-Oscar, MR-Oscar, Calm-Lin) whose RMSEs are nearly identical; the band spans their min--max envelopes ($\text{mean} \pm \text{SE}$). Individual lines show the remaining methods. (a)$n^r = 500$, $n^o = 10{,}000$, $d_{\mathrm{true}} = 5$, linear outcome. (b) Shared-latent DGP where $\bm{X}^r$ carries information about $\bm{V}$ beyond $\bm{Z}$; Calm-NN separates from the calibration group as $\omega$ increases. (c) Nonlinear-CATE regime: Calm-NN maintains RMSE below $0.8$ even at $n^r = 100$, where calibration-based methods exceed $5.3$.
Figure 3: RMSE of CATE estimation as a function of intrinsic dimension $d_{\mathrm{true}}$, with $\sigma_V^2 = 1.0$, $n^r = 500$, and linear outcome model. Mean over 20 replicates. Blue band: calibration-group envelope (see Figure \ref{['fig:main-results']} caption). No single method dominates uniformly across intrinsic dimensions.
Figure 4: RMSE of CATE estimation as a function of RCT sample size $n^r$, with $n^o = 10{,}000$ fixed, $\sigma_V^2 = 1.0$, $d_{\mathrm{true}} = 5$, and linear outcome model. Mean over 20 replicates. Blue band: calibration-group envelope (see Figure \ref{['fig:main-results']} caption). HTCE methods perform best at the smallest $n^r$, while calibration-based methods converge as $n^r$ grows.
Figure 5: RMSE across outcome model types (linear, quadratic, sinusoidal), with $\sigma_V^2 = 1.0$, $n^r = 500$, $d_{\mathrm{true}} = 5$. Bars show mean RMSE over 20 replicates. The best method depends on the outcome type.
...and 6 more figures

Theorems & Definitions (7)

Theorem 1: Risk bound for Calm
proof : Proof sketch
Corollary 2: When Calm improves over MR-Oscar
Proposition 3: Safe borrowing
Proposition 4: Linear embedding and imputation
Corollary 5: Linear Calm risk bound
proof : Proof of Theorem \ref{['thm:calm-main']}

Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Calibrated Alignment

Abstract

Improving RCT-Based Treatment Effect Estimation Under Covariate Mismatch via Calibrated Alignment

Authors

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (7)