Coordinated Multi-Neighborhood Learning on a Directed Acyclic Graph
Stephen Smith, Qing Zhou
TL;DR
This work tackles the challenge of causal discovery in high-dimensional settings by focusing on local structure around user-specified target nodes rather than learning the full DAG. It introduces Coordinated Multi-Neighborhood Learning (CML), a two-stage constraint-based framework that builds a maximal ancestral graph over the union of target neighborhoods NB_T and then orients edges jointly across all neighborhoods using a coordinated application of FCI rules. The authors prove population-level and Gaussian-consistency results for the local MAG/PAG learned by CML and demonstrate substantial gains in accuracy and computational efficiency over global methods (like PC) and non-coordinated local methods (SNL) in synthetic experiments, as well as competitive performance on real gene regulatory data. The findings suggest that coordinated local structure learning can yield more precise causal inferences with far lower computational cost, enabling scalable causal discovery focused on scientifically relevant subsets of variables.
Abstract
Learning the structure of causal directed acyclic graphs (DAGs) is useful in many areas of machine learning and artificial intelligence, with wide applications. However, in the high-dimensional setting, it is challenging to obtain good empirical and theoretical results without strong and often restrictive assumptions. Additionally, it is questionable whether all of the variables purported to be included in the network are observable. It is of interest then to restrict consideration to a subset of the variables for relevant and reliable inferences. In fact, researchers in various disciplines can usually select a set of target nodes in the network for causal discovery. This paper develops a new constraint-based method for estimating the local structure around multiple user-specified target nodes, enabling coordination in structure learning between neighborhoods. Our method facilitates causal discovery without learning the entire DAG structure. We establish consistency results for our algorithm with respect to the local neighborhood structure of the target nodes in the true graph. Experimental results on synthetic and real-world data show that our algorithm is more accurate in learning the neighborhood structures with much less computational cost than standard methods that estimate the entire DAG. An R package implementing our methods may be accessed at https://github.com/stephenvsmith/CML.
