Table of Contents
Fetching ...

Nonlinear Causal Discovery through a Sequential Edge Orientation Approach

Stella Huang, Qing Zhou

TL;DR

This work tackles nonlinear causal discovery by combining identifiability results for restricted additive noise models with a sequential edge orientation strategy. Starting from a CPDAG, the proposed PANM-based criterion identifies an orientable undirected edge, which is then directed via a likelihood-ratio test that compares competing conditional models on the relevant subgraph. The method, termed SNOE, delivers a consistent reconstruction of the true DAG in the large-sample limit and demonstrates superior accuracy and computational efficiency compared to existing nonlinear DAG learners across synthetic and real datasets, including the Sachs flow-cytometry network and Tübingen cause-effect pairs. Practically, SNOE provides a scalable, robust alternative to kernel-based and optimization-based approaches, with explicit edge-ordering and a principled orientation test that mitigate model misspecification and reduce reliance on exhaustive search over graph space.

Abstract

Recent advances have established the identifiability of a directed acyclic graph (DAG) under additive noise models (ANMs), spurring the development of various causal discovery methods. However, most existing methods make restrictive model assumptions, rely heavily on general independence tests, or require substantial computational time. To address these limitations, we propose a sequential procedure to orient undirected edges in a completed partial DAG (CPDAG), representing an equivalence class of DAGs, by leveraging the pairwise additive noise model (PANM) to identify their causal directions. We prove that this procedure can recover the true causal DAG assuming a restricted ANM. Building on this result, we develop a novel constraint-based algorithm for learning causal DAGs under nonlinear ANMs. Given an estimated CPDAG, we develop a ranking procedure that sorts undirected edges by their adherence to the PANM, which defines an evaluation order of the edges. To determine the edge direction, we devise a statistical test that compares the log-likelihood values, evaluated with respect to the competing directions, of a sub-graph comprising just the candidate nodes and their identified parents in the partial DAG. We further establish the structural learning consistency of our algorithm in the large-sample limit. Extensive experiments on synthetic and real-world datasets demonstrate that our method is computationally efficient, robust to model misspecification, and consistently outperforms many existing nonlinear DAG learning methods.

Nonlinear Causal Discovery through a Sequential Edge Orientation Approach

TL;DR

This work tackles nonlinear causal discovery by combining identifiability results for restricted additive noise models with a sequential edge orientation strategy. Starting from a CPDAG, the proposed PANM-based criterion identifies an orientable undirected edge, which is then directed via a likelihood-ratio test that compares competing conditional models on the relevant subgraph. The method, termed SNOE, delivers a consistent reconstruction of the true DAG in the large-sample limit and demonstrates superior accuracy and computational efficiency compared to existing nonlinear DAG learners across synthetic and real datasets, including the Sachs flow-cytometry network and Tübingen cause-effect pairs. Practically, SNOE provides a scalable, robust alternative to kernel-based and optimization-based approaches, with explicit edge-ordering and a principled orientation test that mitigate model misspecification and reduce reliance on exhaustive search over graph space.

Abstract

Recent advances have established the identifiability of a directed acyclic graph (DAG) under additive noise models (ANMs), spurring the development of various causal discovery methods. However, most existing methods make restrictive model assumptions, rely heavily on general independence tests, or require substantial computational time. To address these limitations, we propose a sequential procedure to orient undirected edges in a completed partial DAG (CPDAG), representing an equivalence class of DAGs, by leveraging the pairwise additive noise model (PANM) to identify their causal directions. We prove that this procedure can recover the true causal DAG assuming a restricted ANM. Building on this result, we develop a novel constraint-based algorithm for learning causal DAGs under nonlinear ANMs. Given an estimated CPDAG, we develop a ranking procedure that sorts undirected edges by their adherence to the PANM, which defines an evaluation order of the edges. To determine the edge direction, we devise a statistical test that compares the log-likelihood values, evaluated with respect to the competing directions, of a sub-graph comprising just the candidate nodes and their identified parents in the partial DAG. We further establish the structural learning consistency of our algorithm in the large-sample limit. Extensive experiments on synthetic and real-world datasets demonstrate that our method is computationally efficient, robust to model misspecification, and consistently outperforms many existing nonlinear DAG learning methods.

Paper Structure

This paper contains 39 sections, 6 theorems, 29 equations, 16 figures, 4 tables, 4 algorithms.

Key Result

Lemma 1

Assume $\{X_{i}\}_{i=1}^{p}$ follows a restricted ANM with respect to a DAG $\mathcal{G}_0$. Suppose two nodes $X, Y$ are connected by an undirected edge in a PDAG $\mathcal{G}$ which has a consistent extension to $\mathcal{G}_{0}$. If $[X, Y | \text{pa}_{\mathcal{G}}(X), \text{pa}_{\mathcal{G}}(Y)]

Figures (16)

  • Figure 1: Examples to illustrate the PANM. The top row shows the true DAG, while the bottom row features a PDAG extendable to the true DAG with the evaluated edge $X-Y$. (a) $[X, Y\mid A,B]$ satisfies the PANM because both parent sets are fully identified. (b) $[X, Y\mid\varnothing, B]$ satisfies the PANM, despite $A, B$ missing from $\text{pa}_{\mathcal{G}}(X)=\varnothing$, since we can write $X = \widetilde{\varepsilon}_{X} = g(A, B) + \varepsilon_{X}$. (c) $[X, Y \mid A, B]$ does not form a PANM, as common parent $Z$ is not detected and becomes a latent confounder in the model. (d) $[X, Y]$ does not satisfy the PANM since node $Y$ is missing parent $A$, which does not guarantee $\varepsilon_{Y} \perp\!\!\!\!\perp X$.
  • Figure 2: Orientation rules of Line \ref{['lst:line:snoe-population-orient-nc']} in Algorithm \ref{['alg:sneo-population-version']}. For each of the three cases in which $k \in \text{nc}_{\mathcal{G}}(i) \cap \text{nc}_{\mathcal{G}}(j)$ (top panel), we show the corresponding orientation of the red edge(s) in the bottom panel.
  • Figure 3: An illustration of the edge orientation procedure. (a) The true DAG. (b) The CPDAG, with $X_{1}-X_{2}$ highlighted to orient next since it satisfies the pairwise ANM. (c) The resulting PDAG after orienting $X_{1} \rightarrow X_{2}$ and employing Meek's rules. Edges $X_{3}-X_{4}$ and $X_{5}-X_{6}$ follow the pairwise ANM and can be oriented. (d) The true DAG is correctly recovered after orienting edge $X_{6}-X_{7}$, which is ranked last due to missing the parent node $X_{5}$ for $X_6$.
  • Figure 4: Example graphs used for testing the ranking procedure. Red edges indicate the undirected edges in the CPDAG that are evaluated and ranked by the procedure.
  • Figure 5: Type I error of likelihood ratio test applied to the targeted edge in various CPDAG structures. The black lines indicate the significance levels.
  • ...and 11 more figures

Theorems & Definitions (13)

  • Definition 1: Pairwise Additive Noise Model
  • Lemma 1
  • Theorem 1
  • Remark 1
  • Definition 2: Undirected Component in PDAG
  • Proposition 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Theorem 2
  • ...and 3 more