An efficient search-and-score algorithm for ancestral graphs using multivariate information scores
Nikita Lagrange, Herve Isambert
TL;DR
This work tackles causal discovery for ancestral graphs that arise from latent variables by deriving a likelihood decomposition in terms of multivariate cross-information over ac-connected subsets, enabling estimation from observational data. It develops a two-step, locally computed search-and-score algorithm, MIIC_search&score, that starts from MIIC predictions and uses higher-order local information scores (via $H$ and $I$ terms) along with a normalized maximum likelihood regularization to prune edges and orient them while avoiding cycles. Theoretical results (Theorem 1 and Proposition 3) show that the likelihood can be expressed as a sum over ac-connected subsets, and that empirical estimation of these contributions is feasible. Empirically, MIIC_search&score outperforms MIIC and FCI on challenging discrete datasets with latent variables, achieving higher precision with preserved recall and demonstrating robustness to sampling variation, thus offering a scalable approach to causal learning in complex graphs.
Abstract
We propose a greedy search-and-score algorithm for ancestral graphs, which include directed as well as bidirected edges, originating from unobserved latent variables. The normalized likelihood score of ancestral graphs is estimated in terms of multivariate information over relevant ``ac-connected subsets'' of vertices, C, that are connected through collider paths confined to the ancestor set of C. For computational efficiency, the proposed two-step algorithm relies on local information scores limited to the close surrounding vertices of each node (step 1) and edge (step 2). This computational strategy, although restricted to information contributions from ac-connected subsets containing up to two-collider paths, is shown to outperform state-of-the-art causal discovery methods on challenging benchmark datasets.
