Variational DAG Estimation via State Augmentation With Stochastic Permutations
Edwin V. Bonilla, Pantelis Elinas, He Zhao, Maurizio Filippone, Vassili Kitsios, Terry O'Kane
TL;DR
This work tackles Bayesian structure learning for directed acyclic graphs from observational data by introducing a state-augmentation strategy that jointly models node orderings (permutations) and DAGs. A permutation-conditioned DAG construction, paired with Gamma-ranking permutation models and differentiable relaxations (SoftSort, Gumbel-Max), enables tractable variational inference and explicit uncertainty quantification over graph structures. The method (VDESP) supports both linear and nonlinear SEMs, optimizing an ELBO with a factorized variational posterior over permutations and DAGs, and demonstrates competitive performance against a range of Bayesian and non-Bayesian baselines on synthetic, pseudo-real, and real datasets while providing meaningful posterior uncertainty. The approach offers a principled, uncertainty-aware framework for causal structure discovery with scalable gradient-based learning, and points to future improvements via stronger sparsity/hierarchical priors.
Abstract
Estimating the structure of a Bayesian network, in the form of a directed acyclic graph (DAG), from observational data is a statistically and computationally hard problem with essential applications in areas such as causal discovery. Bayesian approaches are a promising direction for solving this task, as they allow for uncertainty quantification and deal with well-known identifiability issues. From a probabilistic inference perspective, the main challenges are (i) representing distributions over graphs that satisfy the DAG constraint and (ii) estimating a posterior over the underlying combinatorial space. We propose an approach that addresses these challenges by formulating a joint distribution on an augmented space of DAGs and permutations. We carry out posterior estimation via variational inference, where we exploit continuous relaxations of discrete distributions. We show that our approach performs competitively when compared with a wide range of Bayesian and non-Bayesian benchmarks on a range of synthetic and real datasets.
