On the Parameter Identifiability of Partially Observed Linear Causal Models
Xinshuai Dong, Ignavier Ng, Biwei Huang, Yuewen Sun, Songyao Jin, Roberto Legaspi, Peter Spirtes, Kun Zhang
TL;DR
This work tackles parameter identifiability in partially observed linear causal models that include latent variables, addressing identification of edge coefficients beyond observed-variable edges. It develops a theory identifying three indeterminacies, provides graphical sufficiency conditions for structure and parameter identifiability (including atomic covers and trek-based criteria), and proposes a likelihood-based estimation method with a trek-rule covariance parameterization to handle variance indeterminacy. Empirical results on synthetic data (GS and OT regimes) and real-world data (Big Five) demonstrate accurate recovery of parameters up to the stated indeterminacies and robust performance under mild misspecification. The paper advances practical causal modeling with latent variables by delivering both identifiability guarantees and a scalable estimation approach suitable for real datasets.
Abstract
Linear causal models are important tools for modeling causal dependencies and yet in practice, only a subset of the variables can be observed. In this paper, we examine the parameter identifiability of these models by investigating whether the edge coefficients can be recovered given the causal structure and partially observed data. Our setting is more general than that of prior research - we allow all variables, including both observed and latent ones, to be flexibly related, and we consider the coefficients of all edges, whereas most existing works focus only on the edges between observed variables. Theoretically, we identify three types of indeterminacy for the parameters in partially observed linear causal models. We then provide graphical conditions that are sufficient for all parameters to be identifiable and show that some of them are provably necessary. Methodologically, we propose a novel likelihood-based parameter estimation method that addresses the variance indeterminacy of latent variables in a specific way and can asymptotically recover the underlying parameters up to trivial indeterminacy. Empirical studies on both synthetic and real-world datasets validate our identifiability theory and the effectiveness of the proposed method in the finite-sample regime. Code: https://github.com/dongxinshuai/scm-identify.
