Table of Contents
Fetching ...

An efficient search-and-score algorithm for ancestral graphs using multivariate information scores

Nikita Lagrange, Herve Isambert

TL;DR

This work tackles causal discovery for ancestral graphs that arise from latent variables by deriving a likelihood decomposition in terms of multivariate cross-information over ac-connected subsets, enabling estimation from observational data. It develops a two-step, locally computed search-and-score algorithm, MIIC_search&score, that starts from MIIC predictions and uses higher-order local information scores (via $H$ and $I$ terms) along with a normalized maximum likelihood regularization to prune edges and orient them while avoiding cycles. Theoretical results (Theorem 1 and Proposition 3) show that the likelihood can be expressed as a sum over ac-connected subsets, and that empirical estimation of these contributions is feasible. Empirically, MIIC_search&score outperforms MIIC and FCI on challenging discrete datasets with latent variables, achieving higher precision with preserved recall and demonstrating robustness to sampling variation, thus offering a scalable approach to causal learning in complex graphs.

Abstract

We propose a greedy search-and-score algorithm for ancestral graphs, which include directed as well as bidirected edges, originating from unobserved latent variables. The normalized likelihood score of ancestral graphs is estimated in terms of multivariate information over relevant ``ac-connected subsets'' of vertices, C, that are connected through collider paths confined to the ancestor set of C. For computational efficiency, the proposed two-step algorithm relies on local information scores limited to the close surrounding vertices of each node (step 1) and edge (step 2). This computational strategy, although restricted to information contributions from ac-connected subsets containing up to two-collider paths, is shown to outperform state-of-the-art causal discovery methods on challenging benchmark datasets.

An efficient search-and-score algorithm for ancestral graphs using multivariate information scores

TL;DR

This work tackles causal discovery for ancestral graphs that arise from latent variables by deriving a likelihood decomposition in terms of multivariate cross-information over ac-connected subsets, enabling estimation from observational data. It develops a two-step, locally computed search-and-score algorithm, MIIC_search&score, that starts from MIIC predictions and uses higher-order local information scores (via and terms) along with a normalized maximum likelihood regularization to prune edges and orient them while avoiding cycles. Theoretical results (Theorem 1 and Proposition 3) show that the likelihood can be expressed as a sum over ac-connected subsets, and that empirical estimation of these contributions is feasible. Empirically, MIIC_search&score outperforms MIIC and FCI on challenging discrete datasets with latent variables, achieving higher precision with preserved recall and demonstrating robustness to sampling variation, thus offering a scalable approach to causal learning in complex graphs.

Abstract

We propose a greedy search-and-score algorithm for ancestral graphs, which include directed as well as bidirected edges, originating from unobserved latent variables. The normalized likelihood score of ancestral graphs is estimated in terms of multivariate information over relevant ``ac-connected subsets'' of vertices, C, that are connected through collider paths confined to the ancestor set of C. For computational efficiency, the proposed two-step algorithm relies on local information scores limited to the close surrounding vertices of each node (step 1) and edge (step 2). This computational strategy, although restricted to information contributions from ac-connected subsets containing up to two-collider paths, is shown to outperform state-of-the-art causal discovery methods on challenging benchmark datasets.

Paper Structure

This paper contains 23 sections, 31 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Cross-entropy decomposition of ancestral graphs. Examples of cross-entropy decomposition of ancestral graphs (red edges, lhs) in terms of relevant multivariate cross-information contributions $I(\bm{C})$ with $\bm{C}\subseteq\bm{V}$ (red nodes, rhs). Simple graphs: ( A) without unshielded colliders, ( B) with a single or non-overlapping unshielded colliders, ( C) with overlapping unshielded colliders through three or more (conditionally) independent parents or ( D) through a two-(or more)-collider path. ( E) Bayesian graph corresponding to the head-and-tail factorization of the two-collider path in ( D) estimated using the empirical distribution $p(.)$, see Appendix C. ( F) Simple Bayesian graph not Markov equivalent to an ancestral graph ( G) sharing the same edges and unshielded collider ali2009. Solid black edges correspond to direct connections or collider paths confined to the corresponding $ac$-connected subset $\bm{C}$, while wiggly edges indicate collider paths extending beyond $\bm{C}$ yet indirectly connected to $\bm{C}$ by an ancestor path, marked with dashed edges, see Definition 2. By contrast, graphs H and I illustrate the fact that collider paths may not be unique nor conserved between two Markov equivalent graphs ( i.e. sharing the same cross-information terms) ali2009.
  • Figure 2: Benchmark results on ancestral graphs of increasing complexity. Benchmark results are averaged over 50 independent categorical datasets from ancestral graphs obtained by hiding 0%, 5%, 10% or 20% of variables in Discrete Bayesian Networks of increasing complexity (see main text): Alarm, Insurance, Barley and Mildew. MIIC_search&score results are compared to MIIC results used as starting point for MIIC_search&score and FCI zheng2024. Causal discovery performance is assessed in terms of Precision and Recall relative to the theoretical PAGs, while counting as false positive all correctly predicted edges but with a different orientation as the directed or bidirected edges of the PAG. Error bars: 95% confidence interval.
  • Figure 3: Benchmark results on bootstrap datasets from ancestral graphs of increasing complexity.Benchmark results on bootstrap sensitivity analysis to sampling noise based on 30 independent resamplings with replacement of single datasets of increasing sizes. Ancestral graphs are obtained by hiding 0%, 5%, 10% or 20% of variables in Discrete Bayesian Networks of increasing complexity (see main text): Alarm, Insurance, Barley and Mildew. MIIC_search&score results are compared to MIIC results used as starting point for MIIC_search&score and FCI zheng2024. The lack of FCI results, except for Alarm on all sample sizes tested ($N\leqslant 20,000$) and for Insurance at small sample sizes ($N\leqslant 1,000$), stems from FCI difficulty to converge on bootstrapped datasets. Causal discovery performance is assessed in terms of Precision and Recall relative to the theoretical PAGs, while counting as false positive all correctly predicted edges but with a different orientation as the directed or bidirected edges of the PAG. Error bars: 95% confidence interval.
  • Figure 4: Simple ancestral graphs.