Variational Causal Inference
Yulun Wu, Layne C. Price, Zichen Wang, Vassilis N. Ioannidis, Robert A. Barton, George Karypis
TL;DR
The paper addresses high-dimensional individualized counterfactual outcomes with limited covariates by introducing Variational Causal Inference (VCI), a semi-autoencoding, variational Bayesian framework. It jointly models the factual outcome via a latent representation $Z$ and the counterfactual via a covariate-specific distribution $p(Y'|X,T')$, deriving an ELBO-based objective that couples reconstruction and counterfactual likelihood while regularizing latent distributions. An efficient influence-function-based estimator for covariate-specific and marginal effects is provided, enabling asymptotically efficient estimation of ${oldsymbol{ ext Ψ}(p)} = ext{E}_p[Y'_{ ext{do}(T'=a)}]$ and related quantities. Empirical evaluation on single-cell perturbation data demonstrates that VCI outperforms state-of-the-art baselines in out-of-distribution predictions and in robust marginal estimations, highlighting its practical impact for high-dimensional outcomes and limited covariates.
Abstract
Estimating an individual's potential outcomes under counterfactual treatments is a challenging task for traditional causal inference and supervised learning approaches when the outcome is high-dimensional (e.g. gene expressions, impulse responses, human faces) and covariates are relatively limited. In this case, to construct one's outcome under a counterfactual treatment, it is crucial to leverage individual information contained in its observed factual outcome on top of the covariates. We propose a deep variational Bayesian framework that rigorously integrates two main sources of information for outcome construction under a counterfactual treatment: one source is the individual features embedded in the high-dimensional factual outcome; the other source is the response distribution of similar subjects (subjects with the same covariates) that factually received this treatment of interest.
