Amortized Active Causal Induction with Deep Reinforcement Learning
Yashas Annadani, Panagiotis Tigas, Stefan Bauer, Adam Foster
TL;DR
The paper tackles sample-efficient causal structure learning under interventions without access to data likelihood by proposing CAASL, a transformer-based amortized policy that actively designs interventions in a HiP-MDP via reinforcement learning. It leverages an AVICI-based reward to guide intervention selection, enabling real-time adaptation and strong zero-shot generalization to higher dimensions and unseen intervention types. Empirical results in synthetic linear SCMs and the SERGIO single-cell gene-regulatory simulator show improved causal graph recovery and robust performance under distribution shifts. The approach connects to sequential Bayesian experimental design through information-gain bounds and offers a practical, scalable path for lab-in-the-loop experimentation in complex biological systems.
Abstract
We present Causal Amortized Active Structure Learning (CAASL), an active intervention design policy that can select interventions that are adaptive, real-time and that does not require access to the likelihood. This policy, an amortized network based on the transformer, is trained with reinforcement learning on a simulator of the design environment, and a reward function that measures how close the true causal graph is to a causal graph posterior inferred from the gathered data. On synthetic data and a single-cell gene expression simulator, we demonstrate empirically that the data acquired through our policy results in a better estimate of the underlying causal graph than alternative strategies. Our design policy successfully achieves amortized intervention design on the distribution of the training environment while also generalizing well to distribution shifts in test-time design environments. Further, our policy also demonstrates excellent zero-shot generalization to design environments with dimensionality higher than that during training, and to intervention types that it has not been trained on.
