Score-based Greedy Search for Structure Identification of Partially Observed Linear Causal Models
Xinshuai Dong, Ignavier Ng, Haoyue Dai, Jiaqi Sun, Xiangchen Song, Peter Spirtes, Kun Zhang
TL;DR
The paper addresses identifying the full structure of partially observed linear causal models from observational data by developing a score-based, greedy search framework. It introduces Generalized N Factor Model (GNFM) and proves identifiability and global consistency of using the likelihood score to recover the structure up to the Markov Equivalence Class (MEC). The Latent variable Greedy Equivalence Search (LGES) algorithm operationalizes this theory in two phases (latent-to-observed and latent-to-latent) and is shown to be asymptotically correct under GNFM, with strong empirical performance on synthetic and real datasets, robustness to misspecification, and practical runtime. The work provides a scalable, theoretically grounded approach for discovering latent and observed causal structure from covariances, with potential extensions to non-Gaussian and nonlinear settings.
Abstract
Identifying the structure of a partially observed causal system is essential to various scientific fields. Recent advances have focused on constraint-based causal discovery to solve this problem, and yet in practice these methods often face challenges related to multiple testing and error propagation. These issues could be mitigated by a score-based method and thus it has raised great attention whether there exists a score-based greedy search method that can handle the partially observed scenario. In this work, we propose the first score-based greedy search method for the identification of structure involving latent variables with identifiability guarantees. Specifically, we propose Generalized N Factor Model and establish the global consistency: the true structure including latent variables can be identified up to the Markov equivalence class by using score. We then design Latent variable Greedy Equivalence Search (LGES), a greedy search algorithm for this class of model with well-defined operators, which search very efficiently over the graph space to find the optimal structure. Our experiments on both synthetic and real-life data validate the effectiveness of our method (code will be publicly available).
