Table of Contents
Fetching ...

Variational DAG Estimation via State Augmentation With Stochastic Permutations

Edwin V. Bonilla, Pantelis Elinas, He Zhao, Maurizio Filippone, Vassili Kitsios, Terry O'Kane

TL;DR

This work tackles Bayesian structure learning for directed acyclic graphs from observational data by introducing a state-augmentation strategy that jointly models node orderings (permutations) and DAGs. A permutation-conditioned DAG construction, paired with Gamma-ranking permutation models and differentiable relaxations (SoftSort, Gumbel-Max), enables tractable variational inference and explicit uncertainty quantification over graph structures. The method (VDESP) supports both linear and nonlinear SEMs, optimizing an ELBO with a factorized variational posterior over permutations and DAGs, and demonstrates competitive performance against a range of Bayesian and non-Bayesian baselines on synthetic, pseudo-real, and real datasets while providing meaningful posterior uncertainty. The approach offers a principled, uncertainty-aware framework for causal structure discovery with scalable gradient-based learning, and points to future improvements via stronger sparsity/hierarchical priors.

Abstract

Estimating the structure of a Bayesian network, in the form of a directed acyclic graph (DAG), from observational data is a statistically and computationally hard problem with essential applications in areas such as causal discovery. Bayesian approaches are a promising direction for solving this task, as they allow for uncertainty quantification and deal with well-known identifiability issues. From a probabilistic inference perspective, the main challenges are (i) representing distributions over graphs that satisfy the DAG constraint and (ii) estimating a posterior over the underlying combinatorial space. We propose an approach that addresses these challenges by formulating a joint distribution on an augmented space of DAGs and permutations. We carry out posterior estimation via variational inference, where we exploit continuous relaxations of discrete distributions. We show that our approach performs competitively when compared with a wide range of Bayesian and non-Bayesian benchmarks on a range of synthetic and real datasets.

Variational DAG Estimation via State Augmentation With Stochastic Permutations

TL;DR

This work tackles Bayesian structure learning for directed acyclic graphs from observational data by introducing a state-augmentation strategy that jointly models node orderings (permutations) and DAGs. A permutation-conditioned DAG construction, paired with Gamma-ranking permutation models and differentiable relaxations (SoftSort, Gumbel-Max), enables tractable variational inference and explicit uncertainty quantification over graph structures. The method (VDESP) supports both linear and nonlinear SEMs, optimizing an ELBO with a factorized variational posterior over permutations and DAGs, and demonstrates competitive performance against a range of Bayesian and non-Bayesian baselines on synthetic, pseudo-real, and real datasets while providing meaningful posterior uncertainty. The approach offers a principled, uncertainty-aware framework for causal structure discovery with scalable gradient-based learning, and points to future improvements via stronger sparsity/hierarchical priors.

Abstract

Estimating the structure of a Bayesian network, in the form of a directed acyclic graph (DAG), from observational data is a statistically and computationally hard problem with essential applications in areas such as causal discovery. Bayesian approaches are a promising direction for solving this task, as they allow for uncertainty quantification and deal with well-known identifiability issues. From a probabilistic inference perspective, the main challenges are (i) representing distributions over graphs that satisfy the DAG constraint and (ii) estimating a posterior over the underlying combinatorial space. We propose an approach that addresses these challenges by formulating a joint distribution on an augmented space of DAGs and permutations. We carry out posterior estimation via variational inference, where we exploit continuous relaxations of discrete distributions. We show that our approach performs competitively when compared with a wide range of Bayesian and non-Bayesian benchmarks on a range of synthetic and real datasets.
Paper Structure (42 sections, 24 equations, 8 figures)

This paper contains 42 sections, 24 equations, 8 figures.

Figures (8)

  • Figure 1: Results on synthetic linear (top) and nonlinear (bottom) data. The structural Hamming distance (SHD, the lower the better); the F1 score (the higher the better); and the number of non-zeros (NNZ, the closer to $\bar{E}=16$ the better) with $D=16$ and on all graphs. The results for DECI, JSP-GFN and DDS were too poor and, consequently, not shown here. GRANDAG, NOTEARS, BCDNET and BAYESDAG are referred to as GRDAG, NTRS, BCD abd BDAG respectively.
  • Figure 2: Results on real datasets: DREAM4 (Left), SACHS (middle) and SYNTREN (right). The F1 score (the higher the better) computed on the classification problem of predicting links including directionality. See \ref{['fig:results-real-all']} in the appendix for SHD values. Method names as in \ref{['fig:results-synthetic']}.
  • Figure 3: Expected calibration error on synthetic data.
  • Figure 4: Results on the synthetic linear data with $D=16$ variables (nodes) on ER (left), SF (middle), and all (right) graphs. The top row is with $\bar{E}=16$ edges and the bottom row with $\bar{E}=64$ edges, respectively.
  • Figure 5: Results on the synthetic nonlinear data with $D=16$ and $E=16$ on ER (left), SF (middle), and all (right) graphs.
  • ...and 3 more figures