Table of Contents
Fetching ...

Transfer Learning for Causal Effect Estimation

Song Wei, Hanyu Zhang, Ronald Moore, Rishikesan Kamaleswaran, Yao Xie

TL;DR

This work addresses causal-effect estimation with limited target-domain data by proposing Transfer Causal Learning ($\ell_1$-TCL), which transfers nuisance-model information from a related source domain and corrects it via sparsity-driven bias adjustment before plug-in ACE estimation. The method supports GLM nuisance models and NN-based extensions, and provides non-asymptotic guarantees under sparsity, showing favorable performance on synthetic and real data, including a sepsis vasopressor-ACE case where naive baselines fail. Key contributions include a two-stage transfer procedure (rough source estimation plus $\ell_1$ bias correction), theoretical error bounds that separate bias and rough-estimation components, and a generic NN-friendly TCL framework with ParT for CATE-like problems. The results demonstrate the practical impact of principled transfer in causal inference, enabling more reliable decision-making in data-limited medical settings and guiding hyperparameter selection via covariate balance metrics.

Abstract

We present a Transfer Causal Learning (TCL) framework when target and source domains share the same covariate/feature spaces, aiming to improve causal effect estimation accuracy in limited data. Limited data is very common in medical applications, where some rare medical conditions, such as sepsis, are of interest. Our proposed method, named \texttt{$\ell_1$-TCL}, incorporates $\ell_1$ regularized TL for nuisance models (e.g., propensity score model); the TL estimator of the nuisance parameters is plugged into downstream average causal/treatment effect estimators (e.g., inverse probability weighted estimator). We establish non-asymptotic recovery guarantees for the \texttt{$\ell_1$-TCL} with generalized linear model (GLM) under the sparsity assumption in the high-dimensional setting, and demonstrate the empirical benefits of \texttt{$\ell_1$-TCL} through extensive numerical simulation for GLM and recent neural network nuisance models. Our method is subsequently extended to real data and generates meaningful insights consistent with medical literature, a case where all baseline methods fail.

Transfer Learning for Causal Effect Estimation

TL;DR

This work addresses causal-effect estimation with limited target-domain data by proposing Transfer Causal Learning (-TCL), which transfers nuisance-model information from a related source domain and corrects it via sparsity-driven bias adjustment before plug-in ACE estimation. The method supports GLM nuisance models and NN-based extensions, and provides non-asymptotic guarantees under sparsity, showing favorable performance on synthetic and real data, including a sepsis vasopressor-ACE case where naive baselines fail. Key contributions include a two-stage transfer procedure (rough source estimation plus bias correction), theoretical error bounds that separate bias and rough-estimation components, and a generic NN-friendly TCL framework with ParT for CATE-like problems. The results demonstrate the practical impact of principled transfer in causal inference, enabling more reliable decision-making in data-limited medical settings and guiding hyperparameter selection via covariate balance metrics.

Abstract

We present a Transfer Causal Learning (TCL) framework when target and source domains share the same covariate/feature spaces, aiming to improve causal effect estimation accuracy in limited data. Limited data is very common in medical applications, where some rare medical conditions, such as sepsis, are of interest. Our proposed method, named \texttt{-TCL}, incorporates regularized TL for nuisance models (e.g., propensity score model); the TL estimator of the nuisance parameters is plugged into downstream average causal/treatment effect estimators (e.g., inverse probability weighted estimator). We establish non-asymptotic recovery guarantees for the \texttt{-TCL} with generalized linear model (GLM) under the sparsity assumption in the high-dimensional setting, and demonstrate the empirical benefits of \texttt{-TCL} through extensive numerical simulation for GLM and recent neural network nuisance models. Our method is subsequently extended to real data and generates meaningful insights consistent with medical literature, a case where all baseline methods fail.
Paper Structure (68 sections, 6 theorems, 106 equations, 5 figures, 12 tables)

This paper contains 68 sections, 6 theorems, 106 equations, 5 figures, 12 tables.

Key Result

Lemma 1

Under Assumptions A0, A1, A2 and A3, when the PS model eq:GLM_propensityscore is correctly specified and the difference $\Delta_\beta$eq:beta_diff is $s$-sparse, the following holds for the estimator $\widehat{\beta}_\texttt{t}$eq:step2 with regularization strength parameter $\lambda_{\rm PS} > 0$:

Figures (5)

  • Figure 1: In our toy example, the treatment assignments differ between target and source domains in that the effects from covariate $X_2$ are different. We do not impose assumptions on whether or not the ACEs are the same for both domains.
  • Figure 2: Support of the PS model parameters via $\ell_1$ regularized logistic regression in both domains in the motivating real example. We can see that they share very similar supports, and their difference is only supported on 6 out of a total of 34 features (listed in Table \ref{['table:real_data_features']} in Section \ref{['sec:real_exp']}). Since the PS model parameter essentially shows how the clinicians assign treatment based on collected EMR data, the "similar support" observation could be evidence that $\Delta_\beta$ in our real example is sparse.
  • Figure 3: Illustration of the general approach for TCL problem. In our proposed $\ell_1$-TCL framework, the nuisance parameter estimation stage leverages $\ell_1$ regularized TL, and the plug-in estimation stage considers IPW, OR and DR estimators.
  • Figure 4: Illustration of the selection bias. In practice (top), especially in the observational study, the treatment assignment is typically dependent on pre-treatment covariates $\boldsymbol{X}$, making the selected (or observed) treatment cohort NOT independent of the outcome variable. As a result, the selected cohort is not "representative" of the whole population, and inference based on such a selected cohort will typically be biased.
  • Figure 5: Comparison between our proposed $\ell_1$-TCL with TO-CL (left) and Merge-CL (right) baseline learning frameworks. In each sub-heatmap, the x-axis represents the sparsity $s$, and the y-axis represents the dimensionality $d$. We report the difference between the average ACE estimation errors of our proposed and the baseline frameworks: positive values indicate improved accuracy, whereas negative values are all truncated to zeros for better visualization.

Theorems & Definitions (11)

  • Definition 1: $s$-sparse vector
  • Definition 2: Compatibility Condition bastani2021predicting
  • Lemma 1: Transferable guarantee for PS model
  • Remark 1
  • Lemma 2
  • Theorem 1: Non-asymptotic recovery guarantee for $\widehat{\tau}_{\rm TLIPW}$ \ref{['eq:TLIPW']}
  • Definition 3
  • Lemma 3: Transferable guarantee for OR model, cf. Theorem 5 bastani2021predicting
  • Theorem 2: Non-asymptotic recovery guarantee for $\widehat{\tau}_{\rm TLDR}$ \ref{['eq:TLDR']}
  • Theorem 3: Non-asymptotic recovery guarantee for $\widehat{\tau}_{\rm TLDR}$ \ref{['eq:TLDR']}
  • ...and 1 more