Enhancing Bayesian Network Structural Learning with Monte Carlo Tree Search
Jorge D. Laborda, Pablo Torrijos, José M. Puerta, José A. Gámez
TL;DR
This work presents MCTS-BN, an adaptation of Monte Carlo Tree Search for Bayesian Network structural learning that searches over ancestral topological orders and uses a constrained Hill Climbing learner to score each order. To manage the enormous search space, it employs a semi-random rollout guided by orders derived from heuristic algorithms such as HC, GES, and PC, with a standardized reward based on the $nBDeu$ score and a tuned UCT formula. Empirical results across six real BN benchmarks show that MCTS-BN consistently improves over base learners, can surpass gold-standard results when supplied with favorable orders, and maintains reasonable runtimes even with many iterations. The approach demonstrates robust performance in high-dimensional BN learning and suggests future work in integrating more base learners and exploring distributed implementations.
Abstract
This article presents MCTS-BN, an adaptation of the Monte Carlo Tree Search (MCTS) algorithm for the structural learning of Bayesian Networks (BNs). Initially designed for game tree exploration, MCTS has been repurposed to address the challenge of learning BN structures by exploring the search space of potential ancestral orders in Bayesian Networks. Then, it employs Hill Climbing (HC) to derive a Bayesian Network structure from each order. In large BNs, where the search space for variable orders becomes vast, using completely random orders during the rollout phase is often unreliable and impractical. We adopt a semi-randomized approach to address this challenge by incorporating variable orders obtained from other heuristic search algorithms such as Greedy Equivalent Search (GES), PC, or HC itself. This hybrid strategy mitigates the computational burden and enhances the reliability of the rollout process. Experimental evaluations demonstrate the effectiveness of MCTS-BN in improving BNs generated by traditional structural learning algorithms, exhibiting robust performance even when base algorithm orders are suboptimal and surpassing the gold standard when provided with favorable orders.
