Optimal Transport for Structure Learning Under Missing Data
Vy Vo, He Zhao, Trung Le, Edwin V. Bonilla, Dinh Phung
TL;DR
This work tackles causal discovery when data are missing, showing that naive imputation prior to structure learning is suboptimal. It introduces OTM, an Optimal Transport-based, score-based framework that jointly learns missing-data imputations and a causal graph by minimizing the Wasserstein distance between the observed-data distribution and the model distribution, using a learnable imputation and a push-forward to align completed samples with the SCM. The approach is agnostic to the base complete-data causal discovery method and accommodates nonlinear additive-noise models, demonstrating superior recovery of true graphs and better scalability in simulations and real biological datasets. The method provides a principled way to perform structure learning under MAR/MNAR settings and highlights identifiability considerations, with potential impact on robust causal inference in real-world messy data.
Abstract
Causal discovery in the presence of missing data introduces a chicken-and-egg dilemma. While the goal is to recover the true causal structure, robust imputation requires considering the dependencies or, preferably, causal relations among variables. Merely filling in missing values with existing imputation methods and subsequently applying structure learning on the complete data is empirically shown to be sub-optimal. To address this problem, we propose a score-based algorithm for learning causal structures from missing data based on optimal transport. This optimal transport viewpoint diverges from existing score-based approaches that are dominantly based on expectation maximization. We formulate structure learning as a density fitting problem, where the goal is to find the causal model that induces a distribution of minimum Wasserstein distance with the observed data distribution. Our framework is shown to recover the true causal graphs more effectively than competing methods in most simulations and real-data settings. Empirical evidence also shows the superior scalability of our approach, along with the flexibility to incorporate any off-the-shelf causal discovery methods for complete data.
