Learning Discretized Bayesian Networks with GOMEA
Damy M. F. Ha, Tanja Alderliesten, Peter A. N. Bosman
TL;DR
The paper addresses learning discretized Bayesian networks from real-valued data by extending the state-of-the-art BN-GOMEA to jointly optimize discretizations during structure learning, forming DBN-GOMEA. It introduces a density-based fitness with a complexity penalty and encodes discretization counts in the solution, enabling EW/EF discretizations and post-structure discretization via RV-GOMEO/BD approaches. A multi-objective variant (MO-DBN-GOMEA) further optimizes accuracy, complexity, and similarity to an expert network (via KL divergence), yielding multiple trade-off models. Empirical results show DBN-GOMEA matches or surpasses current methods on randomly generated ground-truth networks, with MO-DBN-GOMEA enabling exploration of solutions aligned with expert knowledge, thus enhancing explainability. The work also discusses future directions, including more sophisticated discretization schemes and mixed-integer formulations, to further improve learning performance and interpretability.
Abstract
Bayesian networks model relationships between random variables under uncertainty and can be used to predict the likelihood of events and outcomes while incorporating observed evidence. From an eXplainable AI (XAI) perspective, such models are interesting as they tend to be compact. Moreover, captured relations can be directly inspected by domain experts. In practice, data is often real-valued. Unless assumptions of normality can be made, discretization is often required. The optimal discretization, however, depends on the relations modelled between the variables. This complicates learning Bayesian networks from data. For this reason, most literature focuses on learning conditional dependencies between sets of variables, called structure learning. In this work, we extend an existing state-of-the-art structure learning approach based on the Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) to jointly learn variable discretizations. The proposed Discretized Bayesian Network GOMEA (DBN-GOMEA) obtains similar or better results than the current state-of-the-art when tasked to retrieve randomly generated ground-truth networks. Moreover, leveraging a key strength of evolutionary algorithms, we can straightforwardly perform DBN learning multi-objectively. We show how this enables incorporating expert knowledge in a uniquely insightful fashion, finding multiple DBNs that trade-off complexity, accuracy, and the difference with a pre-determined expert network.
