Learning Discretized Bayesian Networks with GOMEA

Damy M. F. Ha; Tanja Alderliesten; Peter A. N. Bosman

Learning Discretized Bayesian Networks with GOMEA

Damy M. F. Ha, Tanja Alderliesten, Peter A. N. Bosman

TL;DR

The paper addresses learning discretized Bayesian networks from real-valued data by extending the state-of-the-art BN-GOMEA to jointly optimize discretizations during structure learning, forming DBN-GOMEA. It introduces a density-based fitness with a complexity penalty and encodes discretization counts in the solution, enabling EW/EF discretizations and post-structure discretization via RV-GOMEO/BD approaches. A multi-objective variant (MO-DBN-GOMEA) further optimizes accuracy, complexity, and similarity to an expert network (via KL divergence), yielding multiple trade-off models. Empirical results show DBN-GOMEA matches or surpasses current methods on randomly generated ground-truth networks, with MO-DBN-GOMEA enabling exploration of solutions aligned with expert knowledge, thus enhancing explainability. The work also discusses future directions, including more sophisticated discretization schemes and mixed-integer formulations, to further improve learning performance and interpretability.

Abstract

Bayesian networks model relationships between random variables under uncertainty and can be used to predict the likelihood of events and outcomes while incorporating observed evidence. From an eXplainable AI (XAI) perspective, such models are interesting as they tend to be compact. Moreover, captured relations can be directly inspected by domain experts. In practice, data is often real-valued. Unless assumptions of normality can be made, discretization is often required. The optimal discretization, however, depends on the relations modelled between the variables. This complicates learning Bayesian networks from data. For this reason, most literature focuses on learning conditional dependencies between sets of variables, called structure learning. In this work, we extend an existing state-of-the-art structure learning approach based on the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) to jointly learn variable discretizations. The proposed Discretized Bayesian Network GOMEA (DBN-GOMEA) obtains similar or better results than the current state-of-the-art when tasked to retrieve randomly generated ground-truth networks. Moreover, leveraging a key strength of evolutionary algorithms, we can straightforwardly perform DBN learning multi-objectively. We show how this enables incorporating expert knowledge in a uniquely insightful fashion, finding multiple DBNs that trade-off complexity, accuracy, and the difference with a pre-determined expert network.

Learning Discretized Bayesian Networks with GOMEA

TL;DR

Abstract

Paper Structure (20 sections, 7 equations, 6 figures, 2 tables)

This paper contains 20 sections, 7 equations, 6 figures, 2 tables.

Introduction
Discrete Bayesian Networks
Bayesian Network GOMEA
Discretization of Continuous Random Variables in Bayesian Networks
DBN-GOMEA
Post-structure Learning Discretization
Bayesian Method
Multi-Objective Learning
MO-DBN-GOMEA
Experiments and results
Network Generation
Metrics
Single-Objective Scalability
Single-Objective Scalability in Terms of Sample Size
Single-Objective Scalability in Terms of Random Variables
...and 5 more sections

Figures (6)

Figure 1: Example of a DAG used to represent a BN (in black) and all possible edges (in grey).
Figure 2: Impression (rotated visualizations) of the approximation front of a multi-objective run together with the objective values of the ground truth and expert solutions. The x, y, and z axis are on a log-log-linear scale.
Figure 3: Scalability in terms of sample size for 30 random networks with 8 random variables having EW, EF and Random probability distributions. The solid lines are medians, while the shaded areas encompass the first and third interquartile ranges. The arrows on the y-axis point in the direction of improvement per metric.
Figure 4: The scalability in terms of number of random variables. For each number of random variables on the x-axis, 30 ground truth networks were generated with random probability distributions. The solid lines are medians, while the shaded areas encompass the first and third interquartile ranges. The arrows on the y-axis point in the direction of improvement per metric.
Figure 5: Optimizing the discretization after structure learning. The networks of Figure \ref{['fig:exp_samples']}, obtained using DBN-GOMEA-EW and DBN-GOMEA-EF, are further optimized. The lines indicate the median, while the shaded regions encompass the first and third quartiles. The arrows on the y-axis point in the direction of improvement per metric.
...and 1 more figures

Learning Discretized Bayesian Networks with GOMEA

TL;DR

Abstract

Learning Discretized Bayesian Networks with GOMEA

Authors

TL;DR

Abstract

Table of Contents

Figures (6)