Table of Contents
Fetching ...

Learning Discretized Bayesian Networks with GOMEA

Damy M. F. Ha, Tanja Alderliesten, Peter A. N. Bosman

TL;DR

The paper addresses learning discretized Bayesian networks from real-valued data by extending the state-of-the-art BN-GOMEA to jointly optimize discretizations during structure learning, forming DBN-GOMEA. It introduces a density-based fitness with a complexity penalty and encodes discretization counts in the solution, enabling EW/EF discretizations and post-structure discretization via RV-GOMEO/BD approaches. A multi-objective variant (MO-DBN-GOMEA) further optimizes accuracy, complexity, and similarity to an expert network (via KL divergence), yielding multiple trade-off models. Empirical results show DBN-GOMEA matches or surpasses current methods on randomly generated ground-truth networks, with MO-DBN-GOMEA enabling exploration of solutions aligned with expert knowledge, thus enhancing explainability. The work also discusses future directions, including more sophisticated discretization schemes and mixed-integer formulations, to further improve learning performance and interpretability.

Abstract

Bayesian networks model relationships between random variables under uncertainty and can be used to predict the likelihood of events and outcomes while incorporating observed evidence. From an eXplainable AI (XAI) perspective, such models are interesting as they tend to be compact. Moreover, captured relations can be directly inspected by domain experts. In practice, data is often real-valued. Unless assumptions of normality can be made, discretization is often required. The optimal discretization, however, depends on the relations modelled between the variables. This complicates learning Bayesian networks from data. For this reason, most literature focuses on learning conditional dependencies between sets of variables, called structure learning. In this work, we extend an existing state-of-the-art structure learning approach based on the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) to jointly learn variable discretizations. The proposed Discretized Bayesian Network GOMEA (DBN-GOMEA) obtains similar or better results than the current state-of-the-art when tasked to retrieve randomly generated ground-truth networks. Moreover, leveraging a key strength of evolutionary algorithms, we can straightforwardly perform DBN learning multi-objectively. We show how this enables incorporating expert knowledge in a uniquely insightful fashion, finding multiple DBNs that trade-off complexity, accuracy, and the difference with a pre-determined expert network.

Learning Discretized Bayesian Networks with GOMEA

TL;DR

The paper addresses learning discretized Bayesian networks from real-valued data by extending the state-of-the-art BN-GOMEA to jointly optimize discretizations during structure learning, forming DBN-GOMEA. It introduces a density-based fitness with a complexity penalty and encodes discretization counts in the solution, enabling EW/EF discretizations and post-structure discretization via RV-GOMEO/BD approaches. A multi-objective variant (MO-DBN-GOMEA) further optimizes accuracy, complexity, and similarity to an expert network (via KL divergence), yielding multiple trade-off models. Empirical results show DBN-GOMEA matches or surpasses current methods on randomly generated ground-truth networks, with MO-DBN-GOMEA enabling exploration of solutions aligned with expert knowledge, thus enhancing explainability. The work also discusses future directions, including more sophisticated discretization schemes and mixed-integer formulations, to further improve learning performance and interpretability.

Abstract

Bayesian networks model relationships between random variables under uncertainty and can be used to predict the likelihood of events and outcomes while incorporating observed evidence. From an eXplainable AI (XAI) perspective, such models are interesting as they tend to be compact. Moreover, captured relations can be directly inspected by domain experts. In practice, data is often real-valued. Unless assumptions of normality can be made, discretization is often required. The optimal discretization, however, depends on the relations modelled between the variables. This complicates learning Bayesian networks from data. For this reason, most literature focuses on learning conditional dependencies between sets of variables, called structure learning. In this work, we extend an existing state-of-the-art structure learning approach based on the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) to jointly learn variable discretizations. The proposed Discretized Bayesian Network GOMEA (DBN-GOMEA) obtains similar or better results than the current state-of-the-art when tasked to retrieve randomly generated ground-truth networks. Moreover, leveraging a key strength of evolutionary algorithms, we can straightforwardly perform DBN learning multi-objectively. We show how this enables incorporating expert knowledge in a uniquely insightful fashion, finding multiple DBNs that trade-off complexity, accuracy, and the difference with a pre-determined expert network.
Paper Structure (20 sections, 7 equations, 6 figures, 2 tables)

This paper contains 20 sections, 7 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Example of a DAG used to represent a BN (in black) and all possible edges (in grey).
  • Figure 2: Impression (rotated visualizations) of the approximation front of a multi-objective run together with the objective values of the ground truth and expert solutions. The x, y, and z axis are on a log-log-linear scale.
  • Figure 3: Scalability in terms of sample size for 30 random networks with 8 random variables having EW, EF and Random probability distributions. The solid lines are medians, while the shaded areas encompass the first and third interquartile ranges. The arrows on the y-axis point in the direction of improvement per metric.
  • Figure 4: The scalability in terms of number of random variables. For each number of random variables on the x-axis, 30 ground truth networks were generated with random probability distributions. The solid lines are medians, while the shaded areas encompass the first and third interquartile ranges. The arrows on the y-axis point in the direction of improvement per metric.
  • Figure 5: Optimizing the discretization after structure learning. The networks of Figure \ref{['fig:exp_samples']}, obtained using DBN-GOMEA-EW and DBN-GOMEA-EF, are further optimized. The lines indicate the median, while the shaded regions encompass the first and third quartiles. The arrows on the y-axis point in the direction of improvement per metric.
  • ...and 1 more figures