Table of Contents
Fetching ...

RLBayes: a Bayesian Network Structure Learning Algorithm via Reinforcement Learning-Based Search Strategy

Mingcan Wang, Junchang Xin, Luxuan Qu, Qi Chen, Zhiqiong Wang

TL;DR

RLBayes tackles the NP-hard problem of score-based Bayesian network structure learning by integrating a reinforcement-learning–style search with a dynamically maintained $Q$-table to explore the vast space of BN structures. It provides a theoretical convergence claim to the global optimum under reasonable parameter settings and demonstrates empirically that RLBayes yields higher reconstruction accuracy and robustness than several heuristic baselines, particularly on larger networks. The approach unifies local search dynamics with memory of past states to escape local optima, achieving favorable F1 and AUC performance while keeping parameters relatively tractable. This method could significantly impact scalable BN structure learning in domains requiring reliable probabilistic graphical models under uncertainty.

Abstract

The score-based structure learning of Bayesian network (BN) is an effective way to learn BN models, which are regarded as some of the most compelling probabilistic graphical models in the field of representation and reasoning under uncertainty. However, the search space of structure learning grows super-exponentially as the number of variables increases, which makes BN structure learning an NP-hard problem, as well as a combination optimization problem (COP). Despite the successes of many heuristic methods on it, the results of the structure learning of BN are usually unsatisfactory. Inspired by Q-learning, in this paper, a Bayesian network structure learning algorithm via reinforcement learning-based (RL-based) search strategy is proposed, namely RLBayes. The method borrows the idea of RL and tends to record and guide the learning process by a dynamically maintained Q-table. By creating and maintaining the dynamic Q-table, RLBayes achieve storing the unlimited search space within limited space, thereby achieving the structure learning of BN via Q-learning. Not only is it theoretically proved that RLBayes can converge to the global optimal BN structure, but also it is experimentally proved that RLBayes has a better effect than almost all other heuristic search algorithms.

RLBayes: a Bayesian Network Structure Learning Algorithm via Reinforcement Learning-Based Search Strategy

TL;DR

RLBayes tackles the NP-hard problem of score-based Bayesian network structure learning by integrating a reinforcement-learning–style search with a dynamically maintained -table to explore the vast space of BN structures. It provides a theoretical convergence claim to the global optimum under reasonable parameter settings and demonstrates empirically that RLBayes yields higher reconstruction accuracy and robustness than several heuristic baselines, particularly on larger networks. The approach unifies local search dynamics with memory of past states to escape local optima, achieving favorable F1 and AUC performance while keeping parameters relatively tractable. This method could significantly impact scalable BN structure learning in domains requiring reliable probabilistic graphical models under uncertainty.

Abstract

The score-based structure learning of Bayesian network (BN) is an effective way to learn BN models, which are regarded as some of the most compelling probabilistic graphical models in the field of representation and reasoning under uncertainty. However, the search space of structure learning grows super-exponentially as the number of variables increases, which makes BN structure learning an NP-hard problem, as well as a combination optimization problem (COP). Despite the successes of many heuristic methods on it, the results of the structure learning of BN are usually unsatisfactory. Inspired by Q-learning, in this paper, a Bayesian network structure learning algorithm via reinforcement learning-based (RL-based) search strategy is proposed, namely RLBayes. The method borrows the idea of RL and tends to record and guide the learning process by a dynamically maintained Q-table. By creating and maintaining the dynamic Q-table, RLBayes achieve storing the unlimited search space within limited space, thereby achieving the structure learning of BN via Q-learning. Not only is it theoretically proved that RLBayes can converge to the global optimal BN structure, but also it is experimentally proved that RLBayes has a better effect than almost all other heuristic search algorithms.

Paper Structure

This paper contains 14 sections, 7 equations, 4 figures, 3 tables, 2 algorithms.

Figures (4)

  • Figure 1: Operation and Reverse Operation
  • Figure 2: Overall iteration procedure of RLBayes
  • Figure 3: An example of the maintenance of $table_q$
  • Figure 4: Analysis of the Parameters of RLBayes

Theorems & Definitions (2)

  • Example 1
  • proof