Table of Contents
Fetching ...

Scaling Up Bayesian DAG Sampling

Daniele Nikzad, Alexander Zhilkin, Juha Harviainen, Jack Kuipers, Giusi Moffa, Mikko Koivisto

TL;DR

This work addresses scalable Bayesian learning of DAGs by accelerating Markov chain sampling in the space of graphs. It introduces Gibby, a fast basic-moves scheme that upper-bounds acceptance via an extended score and leverages geometric waiting times, along with efficient proposal generation and acyclicity checking, achieving substantial speedups over prior methods. It also develops epsilon-pruning to safely discard unlikely parent sets, drastically reducing the cost of moves that resample neighborhoods while preserving posterior accuracy within quantified bounds. Empirical evaluations show Gibby outperforming competing samplers on large networks and, when combined with pruning, enabling reliable posterior estimation where exact methods are infeasible. These techniques extend the practical reach of structure MCMC for Bayesian DAGs and have potential applications beyond DAG sampling in other combinatorial inference tasks.

Abstract

Bayesian inference of Bayesian network structures is often performed by sampling directed acyclic graphs along an appropriately constructed Markov chain. We present two techniques to improve sampling. First, we give an efficient implementation of basic moves, which add, delete, or reverse a single arc. Second, we expedite summing over parent sets, an expensive task required for more sophisticated moves: we devise a preprocessing method to prune possible parent sets so as to approximately preserve the sums. Our empirical study shows that our techniques can yield substantial efficiency gains compared to previous methods.

Scaling Up Bayesian DAG Sampling

TL;DR

This work addresses scalable Bayesian learning of DAGs by accelerating Markov chain sampling in the space of graphs. It introduces Gibby, a fast basic-moves scheme that upper-bounds acceptance via an extended score and leverages geometric waiting times, along with efficient proposal generation and acyclicity checking, achieving substantial speedups over prior methods. It also develops epsilon-pruning to safely discard unlikely parent sets, drastically reducing the cost of moves that resample neighborhoods while preserving posterior accuracy within quantified bounds. Empirical evaluations show Gibby outperforming competing samplers on large networks and, when combined with pruning, enabling reliable posterior estimation where exact methods are infeasible. These techniques extend the practical reach of structure MCMC for Bayesian DAGs and have potential applications beyond DAG sampling in other combinatorial inference tasks.

Abstract

Bayesian inference of Bayesian network structures is often performed by sampling directed acyclic graphs along an appropriately constructed Markov chain. We present two techniques to improve sampling. First, we give an efficient implementation of basic moves, which add, delete, or reverse a single arc. Second, we expedite summing over parent sets, an expensive task required for more sophisticated moves: we devise a preprocessing method to prune possible parent sets so as to approximately preserve the sums. Our empirical study shows that our techniques can yield substantial efficiency gains compared to previous methods.

Paper Structure

This paper contains 31 sections, 3 theorems, 27 equations, 7 figures, 6 tables.

Key Result

Theorem 1

The $\epsilon$-pruning of $f$ is $\epsilon$-close to $f$.

Figures (7)

  • Figure 1: Performance on small networks. For each sampler, shown is the MAD of five independent runs as a function of running time. For each run, 100,000 DAGs were collected with even spacing.
  • Figure 2: Performance on large networks. For Gibby the number of candidate parents $K$ were set as in Table \ref{['table:pruning']}. For each sampler, shown is the log posterior probability of the sampled DAGs in five independent runs. For Hailfinder, the score of the data-generating DAG $G^*$ is around $-56010$ and not visible. For Andes, Gibby spends the first hours in score computations, before sampling.
  • Figure 3: Performance of DAG samplers on an Asia dataset of size 1,000. Top: The log posterior probability of the sampled DAG in five independent runs. The posterior probability of the data-generating DAG $G^*$ is marked by a horizontal black line. Bottom: The estimated arc posterior probabilities at the end of the runs, against the exact values. For each node pair, the best and the worst estimate over the five runs are shown.
  • Figure 4: Performance of DAG samplers on an Sachs dataset of size 1,000. Top: The log posterior probability of the sampled DAG in five independent runs. The posterior probability of the data-generating DAG $G^*$ is marked by a horizontal black line. Bottom: The estimated arc posterior probabilities at the end of the runs, against the exact values. For each node pair, the best and the worst estimate over the five runs are shown.
  • Figure 5: Performance of DAG samplers on a Child dataset of size 1,000. Top: The log posterior probability of the sampled DAG in five independent runs. The posterior probability of the data-generating DAG $G^*$ is marked by a horizontal black line. Bottom: The estimated arc posterior probabilities at the end of the runs, against the exact values. For each node pair, the best and the worst estimate over the five runs are shown.
  • ...and 2 more figures

Theorems & Definitions (7)

  • Definition 1: $\epsilon$-close
  • Definition 2: $\epsilon$-pruning
  • Theorem 1: Suppl. \ref{['se:proof']}
  • Theorem 2
  • Remark 1
  • Proposition 3
  • proof