Table of Contents
Fetching ...

Federated Bayesian Network Ensembles

Florian van Daalen, Lianne Ippel, Andre Dekker, Inigo Bermejo

TL;DR

Federated Bayesian Network Ensembles (FBNE) introduces an ensemble of locally trained Bayesian networks for federated learning to address non-IID data and population bias across data sites. Unlike centralized BN training, FBNE emphasizes local computation and ensembling with weighted voting, yielding substantial runtime reductions while achieving comparable accuracy to VertiBayes in many scenarios. The study finds FBNE particularly effective when data are not heavily missing and when populations exhibit biases that can be exploited by ensembling, though VertiBayes can outperform FBNE in missing-data settings. Overall, FBNE offers a practical, faster option for privacy-preserving federated inference and serves as a strong exploratory tool before committing to more complex federated BN methods.

Abstract

Federated learning allows us to run machine learning algorithms on decentralized data when data sharing is not permitted due to privacy concerns. Ensemble-based learning works by training multiple (weak) classifiers whose output is aggregated. Federated ensembles are ensembles applied to a federated setting, where each classifier in the ensemble is trained on one data location. In this article, we explore the use of federated ensembles of Bayesian networks (FBNE) in a range of experiments and compare their performance with locally trained models and models trained with VertiBayes, a federated learning algorithm to train Bayesian networks from decentralized data. Our results show that FBNE outperforms local models and provides a significant increase in training speed compared with VertiBayes while maintaining a similar performance in most settings, among other advantages. We show that FBNE is a potentially useful tool within the federated learning toolbox, especially when local populations are heavily biased, or there is a strong imbalance in population size across parties. We discuss the advantages and disadvantages of this approach in terms of time complexity, model accuracy, privacy protection, and model interpretability.

Federated Bayesian Network Ensembles

TL;DR

Federated Bayesian Network Ensembles (FBNE) introduces an ensemble of locally trained Bayesian networks for federated learning to address non-IID data and population bias across data sites. Unlike centralized BN training, FBNE emphasizes local computation and ensembling with weighted voting, yielding substantial runtime reductions while achieving comparable accuracy to VertiBayes in many scenarios. The study finds FBNE particularly effective when data are not heavily missing and when populations exhibit biases that can be exploited by ensembling, though VertiBayes can outperform FBNE in missing-data settings. Overall, FBNE offers a practical, faster option for privacy-preserving federated inference and serves as a strong exploratory tool before committing to more complex federated BN methods.

Abstract

Federated learning allows us to run machine learning algorithms on decentralized data when data sharing is not permitted due to privacy concerns. Ensemble-based learning works by training multiple (weak) classifiers whose output is aggregated. Federated ensembles are ensembles applied to a federated setting, where each classifier in the ensemble is trained on one data location. In this article, we explore the use of federated ensembles of Bayesian networks (FBNE) in a range of experiments and compare their performance with locally trained models and models trained with VertiBayes, a federated learning algorithm to train Bayesian networks from decentralized data. Our results show that FBNE outperforms local models and provides a significant increase in training speed compared with VertiBayes while maintaining a similar performance in most settings, among other advantages. We show that FBNE is a potentially useful tool within the federated learning toolbox, especially when local populations are heavily biased, or there is a strong imbalance in population size across parties. We discuss the advantages and disadvantages of this approach in terms of time complexity, model accuracy, privacy protection, and model interpretability.
Paper Structure (21 sections, 5 figures, 11 tables)

This paper contains 21 sections, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Two local networks based on locally available data
  • Figure 2: Global network created by combining the two local models from figure \ref{['local']} while utilizing expert knowledge.
  • Figure 3: Horizontally split data
  • Figure 4: Vertically split data
  • Figure 5: Hybrid split data