Table of Contents
Fetching ...

BUILD with Precision: Bottom-Up Inference of Linear DAGs

Hamed Ajorlou, Samuel Rey, Gonzalo Mateos, Geert Leus, Antonio G. Marques

TL;DR

The paper addresses identifiability of DAG structures from observational data under linear Gaussian SEMs with equal noise variances and introduces BUILD, a deterministic bottom-up algorithm that reconstructs the DAG by identifying leaf nodes via the population precision matrix and recovering their parents, followed by iterative pruning and Schur-complement updates. To handle finite-sample ill-conditioning, BUILD incorporates a precision-matrix refresh mechanism that re-estimates Θ at intervals, trading some runtime for robustness. Empirical results on synthetic benchmarks show BUILD achieving strong edge-detection and weight-estimation performance, outperforming several baselines in SHD and FDR while maintaining competitive runtimes. The work highlights the trade-offs between accuracy and computation due to precision estimation errors and points to future extensions including adaptive refresh schedules and broader SEM classes.

Abstract

Learning the structure of directed acyclic graphs (DAGs) from observational data is a central problem in causal discovery, statistical signal processing, and machine learning. Under a linear Gaussian structural equation model (SEM) with equal noise variances, the problem is identifiable and we show that the ensemble precision matrix of the observations exhibits a distinctive structure that facilitates DAG recovery. Exploiting this property, we propose BUILD (Bottom-Up Inference of Linear DAGs), a deterministic stepwise algorithm that identifies leaf nodes and their parents, then prunes the leaves by removing incident edges to proceed to the next step, exactly reconstructing the DAG from the true precision matrix. In practice, precision matrices must be estimated from finite data, and ill-conditioning may lead to error accumulation across BUILD steps. As a mitigation strategy, we periodically re-estimate the precision matrix (with less variables as leaves are pruned), trading off runtime for enhanced robustness. Reproducible results on challenging synthetic benchmarks demonstrate that BUILD compares favorably to state-of-the-art DAG learning algorithms, while offering an explicit handle on complexity.

BUILD with Precision: Bottom-Up Inference of Linear DAGs

TL;DR

The paper addresses identifiability of DAG structures from observational data under linear Gaussian SEMs with equal noise variances and introduces BUILD, a deterministic bottom-up algorithm that reconstructs the DAG by identifying leaf nodes via the population precision matrix and recovering their parents, followed by iterative pruning and Schur-complement updates. To handle finite-sample ill-conditioning, BUILD incorporates a precision-matrix refresh mechanism that re-estimates Θ at intervals, trading some runtime for robustness. Empirical results on synthetic benchmarks show BUILD achieving strong edge-detection and weight-estimation performance, outperforming several baselines in SHD and FDR while maintaining competitive runtimes. The work highlights the trade-offs between accuracy and computation due to precision estimation errors and points to future extensions including adaptive refresh schedules and broader SEM classes.

Abstract

Learning the structure of directed acyclic graphs (DAGs) from observational data is a central problem in causal discovery, statistical signal processing, and machine learning. Under a linear Gaussian structural equation model (SEM) with equal noise variances, the problem is identifiable and we show that the ensemble precision matrix of the observations exhibits a distinctive structure that facilitates DAG recovery. Exploiting this property, we propose BUILD (Bottom-Up Inference of Linear DAGs), a deterministic stepwise algorithm that identifies leaf nodes and their parents, then prunes the leaves by removing incident edges to proceed to the next step, exactly reconstructing the DAG from the true precision matrix. In practice, precision matrices must be estimated from finite data, and ill-conditioning may lead to error accumulation across BUILD steps. As a mitigation strategy, we periodically re-estimate the precision matrix (with less variables as leaves are pruned), trading off runtime for enhanced robustness. Reproducible results on challenging synthetic benchmarks demonstrate that BUILD compares favorably to state-of-the-art DAG learning algorithms, while offering an explicit handle on complexity.

Paper Structure

This paper contains 5 sections, 4 equations, 1 figure, 1 table, 1 algorithm.

Figures (1)

  • Figure 1: Performance of BUILD compared with state-of-the-art baselines. Results are averaged over 10 trials, with shaded regions indicating the 10th and 90th percentiles. (a) NMSE of edge estimation as a function of sample size. (b) SHD as a function of sample size. (c) Comparison with order-based methods: SHD (left) and runtime in log scale (right) versus the number of nodes.

Theorems & Definitions (1)

  • proof