Nonlinear Causal Discovery through a Sequential Edge Orientation Approach
Stella Huang, Qing Zhou
TL;DR
This work tackles nonlinear causal discovery by combining identifiability results for restricted additive noise models with a sequential edge orientation strategy. Starting from a CPDAG, the proposed PANM-based criterion identifies an orientable undirected edge, which is then directed via a likelihood-ratio test that compares competing conditional models on the relevant subgraph. The method, termed SNOE, delivers a consistent reconstruction of the true DAG in the large-sample limit and demonstrates superior accuracy and computational efficiency compared to existing nonlinear DAG learners across synthetic and real datasets, including the Sachs flow-cytometry network and Tübingen cause-effect pairs. Practically, SNOE provides a scalable, robust alternative to kernel-based and optimization-based approaches, with explicit edge-ordering and a principled orientation test that mitigate model misspecification and reduce reliance on exhaustive search over graph space.
Abstract
Recent advances have established the identifiability of a directed acyclic graph (DAG) under additive noise models (ANMs), spurring the development of various causal discovery methods. However, most existing methods make restrictive model assumptions, rely heavily on general independence tests, or require substantial computational time. To address these limitations, we propose a sequential procedure to orient undirected edges in a completed partial DAG (CPDAG), representing an equivalence class of DAGs, by leveraging the pairwise additive noise model (PANM) to identify their causal directions. We prove that this procedure can recover the true causal DAG assuming a restricted ANM. Building on this result, we develop a novel constraint-based algorithm for learning causal DAGs under nonlinear ANMs. Given an estimated CPDAG, we develop a ranking procedure that sorts undirected edges by their adherence to the PANM, which defines an evaluation order of the edges. To determine the edge direction, we devise a statistical test that compares the log-likelihood values, evaluated with respect to the competing directions, of a sub-graph comprising just the candidate nodes and their identified parents in the partial DAG. We further establish the structural learning consistency of our algorithm in the large-sample limit. Extensive experiments on synthetic and real-world datasets demonstrate that our method is computationally efficient, robust to model misspecification, and consistently outperforms many existing nonlinear DAG learning methods.
