Identifiability of latent causal graphical models without pure children
Seunghyun Lee, Yuqi Gu
TL;DR
This work tackles identifiability of causal graphs with binary latent variables under a broad, nonparametric measurement model. It introduces a double triangular condition on the latent-to-observed graph $\Gamma$ that guarantees identifiability of the latent dimension $K$, the latent graph $\Lambda$, and the conditional distributions $\mathbb{P}(H)$ and $\mathbb{P}(X\mid H)$ without relying on pure children or parametric forms. It also derives necessary conditions—three observed children per latent and a subset condition on $\Gamma$—to delineate fundamental limits, and provides a tensor-decomposition-based proof framework (Kruskal's theorem) to establish identifiability under the proposed sufficient conditions. Simulations demonstrate accurate recovery of latent structure under realistic sample sizes, validating the theory. Overall, the results substantially relax prior identifiability requirements and broaden the applicability of latent causal discovery to complex, real-world data with mixed-type observations.
Abstract
This paper considers a challenging problem of identifying a causal graphical model under the presence of latent variables. While various identifiability conditions have been proposed in the literature, they often require multiple pure children per latent variable or restrictions on the latent causal graph. Furthermore, it is common for all observed variables to exhibit the same modality. Consequently, the existing identifiability conditions are often too stringent for complex real-world data. We consider a general nonparametric measurement model with arbitrary observed variable types and binary latent variables, and propose a double triangular graphical condition that guarantees identifiability of the entire causal graphical model. The proposed condition significantly relaxes the popular pure children condition. We also establish necessary conditions for identifiability and provide valuable insights into fundamental limits of identifiability. Simulation studies verify that latent structures satisfying our conditions can be accurately estimated from data.
