Causal Data Fusion for Panel Data without a Pre-Intervention Period
Zou Yang, Seung Hee Lee, Julia R. Köhler, AmirEmad Ghassami
TL;DR
The paper tackles causal inference in panel settings when no pre-intervention data exist, by introducing two data-fusion frameworks that borrow information from an auxiliary reference domain. The equi-confounding data fusion framework provides linear and multiplicative (logarithmic) identifiability conditions that link target and reference outcomes to identify the target causal effect $\psi_0$, with unbiased or bias-bounded estimators. The synthetic control data fusion approach generalizes donor-weight matching across two domains, coupling outcome and covariate alignment via a constrained optimization and deriving finite-sample bias bounds under a factor-model with latent confounders. Through simulations and a real-world Chelsea, MA vaccination application, the methods demonstrate robust performance and practical utility for urgent, data-constrained policy evaluation, with replication materials publicly available.
Abstract
Traditional panel-data causal inference frameworks, such as difference-in-differences and synthetic control methods, rely on pre-intervention data to estimate counterfactual means. However, such data may be unavailable in real-world settings when interventions are implemented in response to sudden events, such as public health crises or epidemiological shocks. In this paper, we introduce two data-fusion methods for causal inference from panel data in scenarios where pre-intervention data are unavailable. These methods leverage auxiliary reference domains with related panel data to estimate causal effects in the target domain, thereby overcoming the limitations imposed by the absence of pre-intervention data. We demonstrate the efficacy of these methods by deriving bounds on the absolute bias that converge to zero under suitable conditions, as well as through simulations across a variety of panel-data settings. Our proposed methodology renders causal inference feasible in urgent and data-constrained environments where the assumptions of existing causal inference frameworks are not met. As an application of our methodology, we evaluate the effect of a community organization vaccination intervention in Chelsea, Massachusetts on COVID-19 vaccination rates.
