Table of Contents
Fetching ...

Extremely Greedy Equivalence Search

Achille Nazaret, David Blei

TL;DR

The paper tackles the problem of causal structure discovery from finite data, where recovering the Markov Equivalence Class (MEC) is challenging for GES in dense graphs. It introduces eXtremely Greedy Equivalence Search (XGES), a two-pronged strategy: (i) XGES-0, a deletions-prioritized, interleaved search that preserves theoretical guarantees, and (ii) XGES, which further refines the MEC by testing deletions of early-inserted edges to escape local maxima. The authors develop an efficient CPDAG-based implementation with score-update caching and validity propagation to scale to larger, denser graphs, and provide rigorous empirical evidence showing XGES outperforms GES and variants in accuracy (SHD, F1) and speed (up to 10–30x faster in some settings). The work culminates in practical, open-source Python and C++ implementations, enabling broader adoption in large-scale causal discovery tasks and settings beyond infinite data assumptions.

Abstract

The goal of causal discovery is to learn a directed acyclic graph from data. One of the most well-known methods for this problem is Greedy Equivalence Search (GES). GES searches for the graph by incrementally and greedily adding or removing edges to maximize a model selection criterion. It has strong theoretical guarantees on infinite data but can fail in practice on finite data. In this paper, we first identify some of the causes of GES's failure, finding that it can get blocked in local optima, especially in denser graphs. We then propose eXtremely Greedy Equivalent Search (XGES), which involves a new heuristic to improve the search strategy of GES while retaining its theoretical guarantees. In particular, XGES favors deleting edges early in the search over inserting edges, which reduces the possibility of the search ending in local optima. A further contribution of this work is an efficient algorithmic formulation of XGES (and GES). We benchmark XGES on simulated datasets with known ground truth. We find that XGES consistently outperforms GES in recovering the correct graphs, and it is 10 times faster. XGES implementations in Python and C++ are available at https://github.com/ANazaret/XGES.

Extremely Greedy Equivalence Search

TL;DR

The paper tackles the problem of causal structure discovery from finite data, where recovering the Markov Equivalence Class (MEC) is challenging for GES in dense graphs. It introduces eXtremely Greedy Equivalence Search (XGES), a two-pronged strategy: (i) XGES-0, a deletions-prioritized, interleaved search that preserves theoretical guarantees, and (ii) XGES, which further refines the MEC by testing deletions of early-inserted edges to escape local maxima. The authors develop an efficient CPDAG-based implementation with score-update caching and validity propagation to scale to larger, denser graphs, and provide rigorous empirical evidence showing XGES outperforms GES and variants in accuracy (SHD, F1) and speed (up to 10–30x faster in some settings). The work culminates in practical, open-source Python and C++ implementations, enabling broader adoption in large-scale causal discovery tasks and settings beyond infinite data assumptions.

Abstract

The goal of causal discovery is to learn a directed acyclic graph from data. One of the most well-known methods for this problem is Greedy Equivalence Search (GES). GES searches for the graph by incrementally and greedily adding or removing edges to maximize a model selection criterion. It has strong theoretical guarantees on infinite data but can fail in practice on finite data. In this paper, we first identify some of the causes of GES's failure, finding that it can get blocked in local optima, especially in denser graphs. We then propose eXtremely Greedy Equivalent Search (XGES), which involves a new heuristic to improve the search strategy of GES while retaining its theoretical guarantees. In particular, XGES favors deleting edges early in the search over inserting edges, which reduces the possibility of the search ending in local optima. A further contribution of this work is an efficient algorithmic formulation of XGES (and GES). We benchmark XGES on simulated datasets with known ground truth. We find that XGES consistently outperforms GES in recovering the correct graphs, and it is 10 times faster. XGES implementations in Python and C++ are available at https://github.com/ANazaret/XGES.

Paper Structure

This paper contains 50 sections, 15 theorems, 14 equations, 14 figures, 5 tables, 3 algorithms.

Key Result

Theorem 1

For $\alpha>0$, the BIC for Gaussian linear models is locally consistent once $n$ is large enough.

Figures (14)

  • Figure 1: Illustration of insertions from a MEC A to MECs B or C: (i) choose a DAG in A, (ii) insert the edge $2\leftarrow3$ to obtain another DAG (iii) consider its MEC. Each MEC has all its DAGs on a white plate, and its canonical PDAG on a gray plate (see \ref{['sec:implementation']} for a definition).
  • Figure 2: Performance comparison of GES and XGES variants, measured with SHD for different edge densities $\rho$. XGES heuristics outperform GES and its variants in all scenarios. The dashed lines indicate the number of edges of the true graph. Each boxplot is computed over 30 seeds.
  • Figure 3: Empirical study of GES failure, on 90 simulated datasets with varying variables $d$ and graph densities $\rho$. (left) Differences in BIC between GES and ground-truth are negative. GES does not find the score's global maximum. (right) Ratios of GES-edges to true edges exceed 1. GES returns many more edges than the true graph.
  • Figure 4: (left) The BIC scores of the graphs returned by each method are strongly correlated with the SHD to ground truth (shown for $d=50$, $\rho=3$, 30 seeds). XGES finds the highest scores and lowest SHDs. (right) Runtime of GES and XGES for a wide range of $d$. XGES-0 is up to 30 times faster than GES, and XGES up to 10 times faster. fGES may have overhead due to Java while other methods are in C++.
  • Figure 5: Performance of GES and XGES when varying (left) the number of samples $n$, and (right) the regularization strength $\alpha$. Increasing $n$ improves XGES while it hurts GES and its variants. Increasing $\alpha$ initially improves GES but eventually hurts all methods. The dashed lines indicate the number of edges of the true graph. Error bars over 30 seeds.
  • ...and 9 more figures

Theorems & Definitions (37)

  • Remark 1
  • Definition 1
  • Definition 2: Local Consistency, chickering2002optimal
  • Theorem 1: Local Consistency of BIC haughton1988choicechickering2002optimal
  • Theorem 2
  • Remark 2
  • Theorem 3
  • Theorem 4: verma1991equivalence
  • Theorem 5
  • proof
  • ...and 27 more