Doubly robust identification of treatment effects from multiple environments
Piersilvio De Bartolomeis, Julia Kostin, Javier Abad, Yixin Wang, Fanny Yang
TL;DR
RAMEN tackles the challenge of identifying treatment effects from observational data in the presence of bad controls and unobserved variables by exploiting heterogeneity across multiple environments. It introduces a doubly robust identification framework that requires only partial knowledge of the causal graph: either the parents of the treatment are observed and invariant, or the parents of the outcome are observed and invariant, across environments. The methodology combines a population-level invariant-set formulation with practical finite-sample estimators, including a kernelized invariance loss and a differentiable Gumbel-trick approach for scalable covariate selection. Empirical evaluations on synthetic, semi-synthetic, and real-world data demonstrate strong performance relative to baselines, with a real-world maternal smoking–birth weight analysis aligning with established epidemiological findings. Together, these contributions advance causal identification under bad controls and unobserved confounding by leveraging cross-environment heterogeneity for robust ATE estimation.
Abstract
Practical and ethical constraints often require the use of observational data for causal inference, particularly in medicine and social sciences. Yet, observational datasets are prone to confounding, potentially compromising the validity of causal conclusions. While it is possible to correct for biases if the underlying causal graph is known, this is rarely a feasible ask in practical scenarios. A common strategy is to adjust for all available covariates, yet this approach can yield biased treatment effect estimates, especially when post-treatment or unobserved variables are present. We propose RAMEN, an algorithm that produces unbiased treatment effect estimates by leveraging the heterogeneity of multiple data sources without the need to know or learn the underlying causal graph. Notably, RAMEN achieves doubly robust identification: it can identify the treatment effect whenever the causal parents of the treatment or those of the outcome are observed, and the node whose parents are observed satisfies an invariance assumption. Empirical evaluations on synthetic and real-world datasets show that our approach outperforms existing methods.
