Table of Contents
Fetching ...

Collaborative Heterogeneous Causal Inference Beyond Meta-analysis

Tianyu Guo, Sai Praneeth Karimireddy, Michael I. Jordan

TL;DR

The paper tackles external validity in causal inference under cross-site heterogeneity by introducing a collaborative inverse propensity score weighting (Clb-IPW) framework that directly aggregates site-specific propensity scores, enabling effective collaboration across disjoint domains. It further integrates outcome models via a decoupled AIPW estimator that leverages public target-census data and federated learning to preserve privacy while achieving asymptotic normality under standard rate conditions. Theoretical results show that Clb-IPW improves efficiency over traditional meta-analysis, and the decoupled AIPW approach retains robustness through orthogonal learning and domain-adaptation strategies. Empirical results on synthetic and real datasets demonstrate enhanced stability and accuracy across varying heterogeneity levels and model misspecifications, highlighting practical impact for privacy-preserving, multi-center causal inference.

Abstract

Collaboration between different data centers is often challenged by heterogeneity across sites. To account for the heterogeneity, the state-of-the-art method is to re-weight the covariate distributions in each site to match the distribution of the target population. Nevertheless, this method could easily fail when a certain site couldn't cover the entire population. Moreover, it still relies on the concept of traditional meta-analysis after adjusting for the distribution shift. In this work, we propose a collaborative inverse propensity score weighting estimator for causal inference with heterogeneous data. Instead of adjusting the distribution shift separately, we use weighted propensity score models to collaboratively adjust for the distribution shift. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases. To account for the vulnerable density estimation, we further discuss the double machine method and show the possibility of using nonparametric density estimation with d<8 and a flexible machine learning method to guarantee asymptotic normality. We propose a federated learning algorithm to collaboratively train the outcome model while preserving privacy. Using synthetic and real datasets, we demonstrate the advantages of our method.

Collaborative Heterogeneous Causal Inference Beyond Meta-analysis

TL;DR

The paper tackles external validity in causal inference under cross-site heterogeneity by introducing a collaborative inverse propensity score weighting (Clb-IPW) framework that directly aggregates site-specific propensity scores, enabling effective collaboration across disjoint domains. It further integrates outcome models via a decoupled AIPW estimator that leverages public target-census data and federated learning to preserve privacy while achieving asymptotic normality under standard rate conditions. Theoretical results show that Clb-IPW improves efficiency over traditional meta-analysis, and the decoupled AIPW approach retains robustness through orthogonal learning and domain-adaptation strategies. Empirical results on synthetic and real datasets demonstrate enhanced stability and accuracy across varying heterogeneity levels and model misspecifications, highlighting practical impact for privacy-preserving, multi-center causal inference.

Abstract

Collaboration between different data centers is often challenged by heterogeneity across sites. To account for the heterogeneity, the state-of-the-art method is to re-weight the covariate distributions in each site to match the distribution of the target population. Nevertheless, this method could easily fail when a certain site couldn't cover the entire population. Moreover, it still relies on the concept of traditional meta-analysis after adjusting for the distribution shift. In this work, we propose a collaborative inverse propensity score weighting estimator for causal inference with heterogeneous data. Instead of adjusting the distribution shift separately, we use weighted propensity score models to collaboratively adjust for the distribution shift. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases. To account for the vulnerable density estimation, we further discuss the double machine method and show the possibility of using nonparametric density estimation with d<8 and a flexible machine learning method to guarantee asymptotic normality. We propose a federated learning algorithm to collaboratively train the outcome model while preserving privacy. Using synthetic and real datasets, we demonstrate the advantages of our method.
Paper Structure (25 sections, 10 theorems, 86 equations, 3 figures, 2 algorithms)

This paper contains 25 sections, 10 theorems, 86 equations, 3 figures, 2 algorithms.

Key Result

Proposition 1

Given Assumptions ass:unconfoundedness and ass:indiv-overlap, using inverse variance weighting, as $N \to \infty$, we have that where

Figures (3)

  • Figure 1: Visualization of the data-generating process and the comparison of proposed estimators.
  • Figure 2: The 95% confidence intervals for synthetic dataset and the real dataset. The red dots mark the true effect size. In Figure \ref{['fig:2x2grid']}, Clb-ipw shows smaller variance than Meta-ipw under all scenarios. The aipw estimator remains consistent when either of the PS or OM model is correctly specified. In Figure \ref{['fig:application']}, $\tau_1$ denotes the estimated causal effect in pennycook_fighting_2020, and $\tau_2$ denotes roozenbeek_how_2021. We find that Meta-ipw, Clb-ipw, Meta-aipw, and Clb-aipw estimators have similar performance, with Meta-ipw and Meta-aipw showing slightly larger effect sizes.
  • Figure 3: The mean squared error changing with heterogeneity. We use $X^\prime$ for all misspecified models. When both models fail to fit the data, there's no theoretical guarantee and all estimators have huge mean squared error. The better performance of Meta-ipw there is meaningless.

Theorems & Definitions (21)

  • Example 1: Collaboration of Clinical Trails
  • Example 2: Collaboration of Observational Studies with Unmeasured Confounder
  • Proposition 1: Meta-ipw Estimator
  • Theorem 2: Clb-ipw Estimator
  • Theorem 3
  • Proposition 4
  • Proposition 5: Point-wise error of density estimation
  • Theorem 6
  • Theorem 7
  • Definition A.1
  • ...and 11 more