Table of Contents
Fetching ...

Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables

Feng Xie, Biwei Huang, Zhengming Chen, Ruichu Cai, Clark Glymour, Zhi Geng, Kun Zhang

TL;DR

This work introduces Generalized Independent Noise (GIN), a condition that extends the Independent Noise (IN) criterion to linear non-Gaussian acyclic models with latent variables, enabling identification of latent structure and causal directions among latent and observed variables. The authors formalize graphical criteria under rank-faithfulness, leveraging side choke-point sets and trek-separation concepts, and show that GIN can reveal latent hierarchies beyond measurement models. They propose LiNGLaH (linear non-Gaussian latent hierarchical models) and a practical two-phase algorithm, LaHiCaSl, to locate latent variables (Phase I) and infer causal relations among latent variables (Phase II), with surrogates enabling testing without directly observing latent nodes. The method demonstrates identifiability under mild assumptions and achieves strong empirical performance on synthetic data and real-world datasets (teacher burnout, multitasking, and mental ability), outperforming several baseline approaches in both latent-location accuracy and causal-order recovery. This framework broadens causal discovery capabilities to complex latent structures and offers practical procedures for structure learning with limited data, with potential extensions to nonlinear settings.

Abstract

We investigate the task of learning causal structure in the presence of latent variables, including locating latent variables and determining their quantity, and identifying causal relationships among both latent and observed variables. To this end, we propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which establishes the independence between a linear combination of certain measured variables and some other measured variables. Specifically, for two observed random vectors $\bf{Y}$ and $\bf{Z}$, GIN holds if and only if $ω^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are independent, where $ω$ is a non-zero parameter vector determined by the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. We then give necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic models. Roughly speaking, GIN implies the existence of a set $\mathcal{S}$ such that $\mathcal{S}$ is causally earlier (w.r.t. the causal ordering) than $\mathbf{Y}$, and that every active (collider-free) path between $\mathbf{Y}$ and $\mathbf{Z}$ must contain a node from $\mathcal{S}$. Interestingly, we find that the independent noise condition (i.e., if there is no confounder, causes are independent of the residual derived from regressing the effect on the causes) can be seen as a special case of GIN. With such a connection between GIN and latent causal structures, we further leverage the proposed GIN condition, together with a well-designed search procedure, to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. We show that the causal structure of a LiNGLaH is identifiable in light of GIN conditions. Experimental results show the effectiveness of the proposed method.

Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables

TL;DR

This work introduces Generalized Independent Noise (GIN), a condition that extends the Independent Noise (IN) criterion to linear non-Gaussian acyclic models with latent variables, enabling identification of latent structure and causal directions among latent and observed variables. The authors formalize graphical criteria under rank-faithfulness, leveraging side choke-point sets and trek-separation concepts, and show that GIN can reveal latent hierarchies beyond measurement models. They propose LiNGLaH (linear non-Gaussian latent hierarchical models) and a practical two-phase algorithm, LaHiCaSl, to locate latent variables (Phase I) and infer causal relations among latent variables (Phase II), with surrogates enabling testing without directly observing latent nodes. The method demonstrates identifiability under mild assumptions and achieves strong empirical performance on synthetic data and real-world datasets (teacher burnout, multitasking, and mental ability), outperforming several baseline approaches in both latent-location accuracy and causal-order recovery. This framework broadens causal discovery capabilities to complex latent structures and offers practical procedures for structure learning with limited data, with potential extensions to nonlinear settings.

Abstract

We investigate the task of learning causal structure in the presence of latent variables, including locating latent variables and determining their quantity, and identifying causal relationships among both latent and observed variables. To this end, we propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which establishes the independence between a linear combination of certain measured variables and some other measured variables. Specifically, for two observed random vectors and , GIN holds if and only if and are independent, where is a non-zero parameter vector determined by the cross-covariance between and . We then give necessary and sufficient graphical criteria of the GIN condition in linear non-Gaussian acyclic models. Roughly speaking, GIN implies the existence of a set such that is causally earlier (w.r.t. the causal ordering) than , and that every active (collider-free) path between and must contain a node from . Interestingly, we find that the independent noise condition (i.e., if there is no confounder, causes are independent of the residual derived from regressing the effect on the causes) can be seen as a special case of GIN. With such a connection between GIN and latent causal structures, we further leverage the proposed GIN condition, together with a well-designed search procedure, to efficiently estimate Linear, Non-Gaussian Latent Hierarchical Models (LiNGLaHs), where latent confounders may also be causally related and may even follow a hierarchical structure. We show that the causal structure of a LiNGLaH is identifiable in light of GIN conditions. Experimental results show the effectiveness of the proposed method.
Paper Structure (33 sections, 25 theorems, 26 equations, 18 figures, 12 tables, 9 algorithms)

This paper contains 33 sections, 25 theorems, 26 equations, 18 figures, 12 tables, 9 algorithms.

Key Result

Proposition 1

Suppose all considered variables follow a linear non-Gaussian acyclic causal model and all variables are observed. Let $\mathbf{Z}$ be a subset of those variables and $Y$ be a single variable. Then the following two statements are equivalent.

Figures (18)

  • Figure 1: A hierarchical causal structure involving $9$ latent variables (shaded nodes) and 13 observed variables (unshaded nodes).
  • Figure 2: A causal structure involving $4$ latent variables and 8 observed variables, where each pair of observed variables in $\{X_1,X_2,X_3,X_4\}$ are affected by two latent variables.
  • Figure 3: (a) Illustrations for the graphical conditions in Theorem \ref{['Theorem:GIN graphical']}, where only the active paths between nodes are drawn, and dashed lines with ✘ indicate the absence of edges. (b) An illustrative example of Theorem \ref{['Theorem:GIN graphical']}.
  • Figure 4: Examples of minimal latent structure. (a) An identifiable latent structure. (b) A non-identifiable latent structure because $L_6$ has fewer neighbor nodes than 3 and $\{L_7,L_8\}$ has fewer neighbor nodes than 5.
  • Figure 5: An illustrative example of Proposition \ref{['Proposition-causa-direction-behind-confounders']}.
  • ...and 13 more figures

Theorems & Definitions (68)

  • Definition 1: IN Condition
  • Proposition 1: Graphical Criterion of IN Condition
  • Definition 2: GIN Condition
  • Theorem 1: Mathematical Characterization of GIN
  • Example 1
  • Proposition 2: Connection between IN and GIN
  • Definition 3: Rank Faithfulness spirtes2013calculation-t-separation
  • Definition 4: Side Choke-Point Set
  • Theorem 2: Graphical Criteria of GIN
  • Example 2
  • ...and 58 more