Table of Contents
Fetching ...

Zero-shot causal learning

Hamed Nilforoshan, Michael Moor, Yusuf Roohani, Yining Chen, Anja Šurina, Michihiro Yasunaga, Sara Oblak, Jure Leskovec

TL;DR

CaML introduces zero-shot causal learning by framing intervention-specific CATE estimation as meta-learning tasks and synthesizing natural experiments to produce pseudo-outcomes. A single meta-model fuses intervention attributes ($W$) with individual covariates ($X$) to predict personalized effects $\tau_w(x)$ for unseen interventions, supported by a theoretical generalization bound and a Reptile-inspired training regime. Empirical results on large-scale medical claims and LINCS cell-line data show CaML surpasses strong zero-shot baselines and matches or exceeds baselines trained on test interventions, highlighting its ability to generalize across thousands of interventions and even to unseen drug combinations. This framework enables principled, scalable prediction of novel intervention effects, with significant implications for personalized medicine, policy design, and drug discovery.

Abstract

Predicting how different interventions will causally affect a specific individual is important in a variety of domains such as personalized medicine, public policy, and online marketing. There are a large number of methods to predict the effect of an existing intervention based on historical data from individuals who received it. However, in many settings it is important to predict the effects of novel interventions (e.g., a newly invented drug), which these methods do not address. Here, we consider zero-shot causal learning: predicting the personalized effects of a novel intervention. We propose CaML, a causal meta-learning framework which formulates the personalized prediction of each intervention's effect as a task. CaML trains a single meta-model across thousands of tasks, each constructed by sampling an intervention, its recipients, and its nonrecipients. By leveraging both intervention information (e.g., a drug's attributes) and individual features~(e.g., a patient's history), CaML is able to predict the personalized effects of novel interventions that do not exist at the time of training. Experimental results on real world datasets in large-scale medical claims and cell-line perturbations demonstrate the effectiveness of our approach. Most strikingly, \method's zero-shot predictions outperform even strong baselines trained directly on data from the test interventions.

Zero-shot causal learning

TL;DR

CaML introduces zero-shot causal learning by framing intervention-specific CATE estimation as meta-learning tasks and synthesizing natural experiments to produce pseudo-outcomes. A single meta-model fuses intervention attributes () with individual covariates () to predict personalized effects for unseen interventions, supported by a theoretical generalization bound and a Reptile-inspired training regime. Empirical results on large-scale medical claims and LINCS cell-line data show CaML surpasses strong zero-shot baselines and matches or exceeds baselines trained on test interventions, highlighting its ability to generalize across thousands of interventions and even to unseen drug combinations. This framework enables principled, scalable prediction of novel intervention effects, with significant implications for personalized medicine, policy design, and drug discovery.

Abstract

Predicting how different interventions will causally affect a specific individual is important in a variety of domains such as personalized medicine, public policy, and online marketing. There are a large number of methods to predict the effect of an existing intervention based on historical data from individuals who received it. However, in many settings it is important to predict the effects of novel interventions (e.g., a newly invented drug), which these methods do not address. Here, we consider zero-shot causal learning: predicting the personalized effects of a novel intervention. We propose CaML, a causal meta-learning framework which formulates the personalized prediction of each intervention's effect as a task. CaML trains a single meta-model across thousands of tasks, each constructed by sampling an intervention, its recipients, and its nonrecipients. By leveraging both intervention information (e.g., a drug's attributes) and individual features~(e.g., a patient's history), CaML is able to predict the personalized effects of novel interventions that do not exist at the time of training. Experimental results on real world datasets in large-scale medical claims and cell-line perturbations demonstrate the effectiveness of our approach. Most strikingly, \method's zero-shot predictions outperform even strong baselines trained directly on data from the test interventions.
Paper Structure (40 sections, 7 theorems, 48 equations, 3 figures, 19 tables, 1 algorithm)

This paper contains 40 sections, 7 theorems, 48 equations, 3 figures, 19 tables, 1 algorithm.

Key Result

Theorem 1

Under our assumptions, with probability $1-\delta$,

Figures (3)

  • Figure 1: Overview of the zero-shot causal learning problem. Each individual has features ($X$), an intervention with features ($W$), and an outcome ($Y$). Lightning bolts () represent interventions (e.g. drugs). The personalized effect of an intervention ($\tau$) is always unobserved. The goal is to predict the $\tau$ for a novel intervention ($W'$) and individual ($X'$) that did not exist during training.
  • Figure 2: Visual illustration of the CaML (causal meta-learning) framework. (1) We sample a task (i.e., an intervention) and a natural experiment from the training data consisting of individuals who either received the intervention (W={}), or did not (W={}). Each individual has features ($X$) and an outcome ($Y$), and the intervention also has information ($W$) (e.g., a drug's attributes). (2) For each individual we estimate the effect of the intervention on the outcome (pseudo-outcomes $\tilde{\tau}$). (3) We predict an individual's pseudo-outcomes $\tilde{\tau}$ using a model that fuses $X$ and $W$. CaML is trained by repeating this procedure across many tasks and corresponding natural experiments.
  • Figure 3: Measuring the robustness of CaML to limitations in the training intervention data. We downsample the number of training interventions and measure CaML’s performance. Overall, we find that CaML’s zero-shot capabilities improve as the set of unique training interventions increases in size. Nevertheless, CaML still achieves strong performance on smaller datasets (e.g., runs with 60% and 80%, of the interventions achieve similar performance). Results are analogous for other metrics on both datasets. Top: Performance on the Claims dataset at predicting the effect on a novel drug on the likelihood of Pancytopenia onset (RATE @ 0.998). Bottom: Performance on the LINCS dataset at predicting the gene expression of the Top 20 and Top 50 most differentially expressed genes (DEGs).

Theorems & Definitions (11)

  • Theorem 1
  • Theorem 4
  • Lemma 5: Theorem 2.3 of bousquet2002bennett
  • Lemma 6
  • proof : Proof of Lemma \ref{['lem:var']}
  • Lemma 7
  • proof
  • proof : Proof of Theorem \ref{['thm:formal']}
  • Lemma 9: Lemma 5 of meir2003generalization
  • Lemma 10
  • ...and 1 more