Table of Contents
Fetching ...

Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning

Marcel Wienöbst, Leonard Henckel, Sebastian Weichwald

TL;DR

The paper tackles causal structure learning from observational data under causal sufficiency by advancing discrete, score-based search over DAGs for linear additive noise models. It introduces FLOP, a fast algorithm that embraces discrete search with four innovations: warm-starting parent selection from the previous order, dynamic Cholesky-based score updates, a principled initial node order, and Iterated Local Search with reinsertion moves. Empirical results show FLOP achieves state-of-the-art finite-sample accuracy at practical run-times, often outperforming continuous relaxations and traditional discrete methods, and scaling to graphs well beyond previous exact-search limits. The work highlights that increasing search depth via budgeted discrete optimization can yield substantial improvements in structure recovery, motivating a renewed emphasis on discrete methods and configurable compute budgets in causal discovery.

Abstract

We present FLOP (Fast Learning of Order and Parents), a score-based causal discovery algorithm for linear models. It pairs fast parent selection with iterative Cholesky-based score updates, cutting run-times over prior algorithms. This makes it feasible to fully embrace discrete search, enabling iterated local search with principled order initialization to find graphs with scores at or close to the global optimum. The resulting structures are highly accurate across benchmarks, with near-perfect recovery in standard settings. This performance calls for revisiting discrete search over graphs as a reasonable approach to causal discovery.

Embracing Discrete Search: A Reasonable Approach to Causal Structure Learning

TL;DR

The paper tackles causal structure learning from observational data under causal sufficiency by advancing discrete, score-based search over DAGs for linear additive noise models. It introduces FLOP, a fast algorithm that embraces discrete search with four innovations: warm-starting parent selection from the previous order, dynamic Cholesky-based score updates, a principled initial node order, and Iterated Local Search with reinsertion moves. Empirical results show FLOP achieves state-of-the-art finite-sample accuracy at practical run-times, often outperforming continuous relaxations and traditional discrete methods, and scaling to graphs well beyond previous exact-search limits. The work highlights that increasing search depth via budgeted discrete optimization can yield substantial improvements in structure recovery, motivating a renewed emphasis on discrete methods and configurable compute budgets in causal discovery.

Abstract

We present FLOP (Fast Learning of Order and Parents), a score-based causal discovery algorithm for linear models. It pairs fast parent selection with iterative Cholesky-based score updates, cutting run-times over prior algorithms. This makes it feasible to fully embrace discrete search, enabling iterated local search with principled order initialization to find graphs with scores at or close to the global optimum. The resulting structures are highly accurate across benchmarks, with near-perfect recovery in standard settings. This performance calls for revisiting discrete search over graphs as a reasonable approach to causal discovery.

Paper Structure

This paper contains 21 sections, 2 theorems, 7 equations, 10 figures, 3 algorithms.

Key Result

Lemma 3.2

Let data set $D$ consist of $n$ i.i.d. observations of a probability distribution represented by a Bayesian network over variables $X_1, \dots, X_p$. Then, in the large sample limit of $n$, grow-shrink finds the restricted Markov boundary of node $v$ relative to a set $Z \subseteq \{X_1, \dots, X_p\

Figures (10)

  • Figure 1: Run-time plotted against Structural Hamming Distance (left) and Ancestor Adjustment Identification Distance henckeladjustment (right) between the CPDAGs learned on linear ANM data and the target CPDAG corresponding to the underlying Erdős-Renyi generated DAG with 50 nodes, average degree 8 and 1000 samples drawn. Every point corresponds to one of 50 random instances; diamonds indicate averages. FLOP variants differ in the number of ILS restarts to escape local optima. The fraction of instances with exact CPDAG recovery is 40% for BOSS and $\text{FLOP}_0$ and 60% for $\text{FLOP}_{20}$ and $\text{FLOP}_{100}$, and zero for the remaining algorithms.
  • Figure 2: Run-time in seconds, averaged over 50 repetitions with standard-deviation error bars, for ER graphs with average degree 16, 1000 samples, and $\{50, 100, 150, \dots, 500\}$ nodes.
  • Figure 3: Run-time plotted against SHD on paths with 50 nodes for 1000 samples (left) and ER graphs with 50 nodes and average degree 16 for 50,000 samples (right). For the path graph, $\text{FLOP}_0$ finds the target graph in 72% of instances, PC in 32%, GES in 66% and the remaining algorithms in none; for the ER graphs, $\text{FLOP}_{20}$ does so in 26% of cases, $\text{FLOP}_{100}$ in 50%, $\text{FLOP}_{500}$ in 56%, Exact in 58%, $\text{BOSS}_{100}$ in 4% and the remaining algorithms in none.
  • Figure 4: Run-time plotted against SHD on SF graphs (left) and the Alarm network (right), both for 1000 samples. For the SF graphs, $\text{FLOP}_{20}$ finds the target CPDAG in 6% of cases, $\text{FLOP}_{100}$ in 10%, the remaining algorithms in none; for the Alarm network, $\text{FLOP}_{0}$ does so in 2% of cases, $\text{FLOP}_{20}$ in 74%, $\text{FLOP}_{100}$ in 82%, BOSS in 6%, GES in 16%, DAGMA and PC in none.
  • Figure 5: Run-time against SHD for data sampled with uniform instead of Gaussian noise on the left and for unstandardized data on the right. In the uniform noise case, $\text{FLOP}_0$ and BOSS find the target CPDAG in 34% of cases, $\text{FLOP}_{20}$ and $\text{FLOP}_{100}$ in 54% of the cases, the remaining algorithms in none. On unstandardized data, BOSS finds the target CPDAG in 22% of cases, $\text{FLOP}_0$ in 34% and $\text{FLOP}_{20}$ and $\text{FLOP}_{100}$ in 54% of cases, the remaining algorithms in none.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Definition 3.1
  • Lemma 3.2
  • proof
  • Theorem 3.3
  • proof