Table of Contents
Fetching ...

An Efficient Procedure for Computing Bayesian Network Structure Learning

Hongming Huang, Joe Suzuki

TL;DR

This paper tackles the NP-hard problem of Bayesian network structure learning by introducing a memory-only, tiered, level-by-level algorithm that achieves global optimality with a single traversal of local structures. By integrating the calculation of optimal parent sets with sink-node identification and leveraging a Quotient Jeffreys’ score-based objective, the method reduces peak memory to $O(\sqrt{p}2^p)$ while maintaining comparable time complexity to prior approaches. Empirical results on the Alarm dataset demonstrate substantial improvements in memory usage and runtime, enabling construction of a 28-variable BN using memory alone (with disk optional for larger scales). The approach offers a scalable, disk-light path to exact BN structure learning, with practical impact for larger graphical models in causal discovery and related domains.

Abstract

We propose a globally optimal Bayesian network structure discovery algorithm based on a progressively leveled scoring approach. Bayesian network structure discovery is a fundamental yet NP-hard problem in the field of probabilistic graphical models, and as the number of variables increases, memory usage grows exponentially. The simple and effective method proposed by Silander and Myllymäki has been widely applied in this field, as it incrementally calculates local scores to achieve global optimality. However, existing methods that utilize disk storage, while capable of handling networks with a larger number of variables, introduce issues such as latency, fragmentation, and additional overhead associated with disk I/O operations. To avoid these problems, we explore how to further enhance computational efficiency and reduce peak memory usage using only memory. We introduce an efficient hierarchical computation method that requires only a single traversal of all local structures, retaining only the data and information necessary for the current computation, thereby improving efficiency and significantly reducing memory requirements. Experimental results indicate that our method, when using only memory, not only reduces peak memory usage but also improves computational efficiency compared to existing methods, demonstrating good scalability for handling larger networks and exhibiting stable experimental results. Ultimately, we successfully achieved the processing of a Bayesian network with 28 variables using only memory.

An Efficient Procedure for Computing Bayesian Network Structure Learning

TL;DR

This paper tackles the NP-hard problem of Bayesian network structure learning by introducing a memory-only, tiered, level-by-level algorithm that achieves global optimality with a single traversal of local structures. By integrating the calculation of optimal parent sets with sink-node identification and leveraging a Quotient Jeffreys’ score-based objective, the method reduces peak memory to while maintaining comparable time complexity to prior approaches. Empirical results on the Alarm dataset demonstrate substantial improvements in memory usage and runtime, enabling construction of a 28-variable BN using memory alone (with disk optional for larger scales). The approach offers a scalable, disk-light path to exact BN structure learning, with practical impact for larger graphical models in causal discovery and related domains.

Abstract

We propose a globally optimal Bayesian network structure discovery algorithm based on a progressively leveled scoring approach. Bayesian network structure discovery is a fundamental yet NP-hard problem in the field of probabilistic graphical models, and as the number of variables increases, memory usage grows exponentially. The simple and effective method proposed by Silander and Myllymäki has been widely applied in this field, as it incrementally calculates local scores to achieve global optimality. However, existing methods that utilize disk storage, while capable of handling networks with a larger number of variables, introduce issues such as latency, fragmentation, and additional overhead associated with disk I/O operations. To avoid these problems, we explore how to further enhance computational efficiency and reduce peak memory usage using only memory. We introduce an efficient hierarchical computation method that requires only a single traversal of all local structures, retaining only the data and information necessary for the current computation, thereby improving efficiency and significantly reducing memory requirements. Experimental results indicate that our method, when using only memory, not only reduces peak memory usage but also improves computational efficiency compared to existing methods, demonstrating good scalability for handling larger networks and exhibiting stable experimental results. Ultimately, we successfully achieved the processing of a Bayesian network with 28 variables using only memory.
Paper Structure (12 sections, 33 equations, 7 figures, 4 tables)

This paper contains 12 sections, 33 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Three Markov equivalent Bayesian Networks.
  • Figure 2: Five essential steps in the state-of-the-art algorithm.
  • Figure 3: Illustration of the level-by-level computation process in the proposed algorithm. Yellow indicates the combination being computed, while blue represents the data and information required for the computation.
  • Figure 4: Comparison of different metrics.
  • Figure 5: Verification of the stability of the proposed method.
  • ...and 2 more figures