Table of Contents
Fetching ...

Federated Learning of Dynamic Bayesian Network via Continuous Optimization from Time Series Data

Jianhong Chen, Ying Ma, Xubo Yue

TL;DR

The paper tackles learning Dynamic Bayesian Network structures from time-series data in distributed, privacy-preserving settings, addressing data heterogeneity across clients. It introduces two methods: Federated DBN Learning (FDBNL) for homogeneous data using ADMM-based continuous optimization, and Personalized Federated DBN Learning (PFDBNL) for heterogeneous data through a proximal-regularized, ADMM-empowered framework. Across synthetic and real-world datasets, including DREAM4 and FMRI, the approaches outperform baselines in challenging, highly distributed scenarios, with PFDBNL offering notable gains in personalization. The work advances scalable, privacy-aware causal structure inference for dynamic systems and lays groundwork for extensions to asynchronous federated optimization and nonlinear dependencies.

Abstract

Traditionally, learning the structure of a Dynamic Bayesian Network has been centralized, requiring all data to be pooled in one location. However, in real-world scenarios, data are often distributed across multiple entities (e.g., companies, devices) that seek to collaboratively learn a Dynamic Bayesian Network while preserving data privacy and security. More importantly, due to the presence of diverse clients, the data may follow different distributions, resulting in data heterogeneity. This heterogeneity poses additional challenges for centralized approaches. In this study, we first introduce a federated learning approach for estimating the structure of a Dynamic Bayesian Network from homogeneous time series data that are horizontally distributed across different parties. We then extend this approach to heterogeneous time series data by incorporating a proximal operator as a regularization term in a personalized federated learning framework. To this end, we propose \texttt{FDBNL} and \texttt{PFDBNL}, which leverage continuous optimization, ensuring that only model parameters are exchanged during the optimization process. Experimental results on synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art techniques, particularly in scenarios with many clients and limited individual sample sizes.

Federated Learning of Dynamic Bayesian Network via Continuous Optimization from Time Series Data

TL;DR

The paper tackles learning Dynamic Bayesian Network structures from time-series data in distributed, privacy-preserving settings, addressing data heterogeneity across clients. It introduces two methods: Federated DBN Learning (FDBNL) for homogeneous data using ADMM-based continuous optimization, and Personalized Federated DBN Learning (PFDBNL) for heterogeneous data through a proximal-regularized, ADMM-empowered framework. Across synthetic and real-world datasets, including DREAM4 and FMRI, the approaches outperform baselines in challenging, highly distributed scenarios, with PFDBNL offering notable gains in personalization. The work advances scalable, privacy-aware causal structure inference for dynamic systems and lays groundwork for extensions to asynchronous federated optimization and nonlinear dependencies.

Abstract

Traditionally, learning the structure of a Dynamic Bayesian Network has been centralized, requiring all data to be pooled in one location. However, in real-world scenarios, data are often distributed across multiple entities (e.g., companies, devices) that seek to collaboratively learn a Dynamic Bayesian Network while preserving data privacy and security. More importantly, due to the presence of diverse clients, the data may follow different distributions, resulting in data heterogeneity. This heterogeneity poses additional challenges for centralized approaches. In this study, we first introduce a federated learning approach for estimating the structure of a Dynamic Bayesian Network from homogeneous time series data that are horizontally distributed across different parties. We then extend this approach to heterogeneous time series data by incorporating a proximal operator as a regularization term in a personalized federated learning framework. To this end, we propose \texttt{FDBNL} and \texttt{PFDBNL}, which leverage continuous optimization, ensuring that only model parameters are exchanged during the optimization process. Experimental results on synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art techniques, particularly in scenarios with many clients and limited individual sample sizes.

Paper Structure

This paper contains 35 sections, 43 equations, 12 figures, 11 tables, 2 algorithms.

Figures (12)

  • Figure 1: Overview of Personalized Federated DBN learning with $d = 3$ nodes, autoregression order $p = 2$, and $K = 3$ clients. The only difference between it and Federated DBN learning is that Federated DBN learning requires each $W_k$ and $A_{k_i}$ to be identical across clients.
  • Figure 2: An example result using FDBNL for Gaussian noise data with $n = 500$ samples, $d = 5$ variables, an autoregressive order $p = 3$, and $K = 10$ clients. All clients have same $W, A$. We set the thresholds $\tau_w = \tau_a = 0.3$. Our algorithm recovers weights close to the ground truth.
  • Figure 3: Structure learning results for $W$ in a DBN with Gaussian noise for $d = 5, 10, 15, 20$ variables, an autoregressive order $p = 1$, and $K = 10$ clients. Each metric value indicates the mean performance across 10 different simulated datasets
  • Figure 4: Structure learning for $W$ of a DBN with Gaussian noise for $d = 10$ variables, $p = 1$, and varying numbers of clients. There are $n = 512$ total samples, distributed evenly across $K \in \{2, 4, 8, 16, 32, 64\}$. Each metric value indicates the mean performance across 10 different simulated datasets
  • Figure 5: Structure learning for $W$ of a DBN with Gaussian noise for $d = 20$ variables, $p = 1$, and varying numbers of clients. There are $n = 512$ total samples, distributed evenly across $K \in \{2, 4, 8, 16, 32, 64\}$. Each metric value indicates the mean performance across 10 different simulated datasets
  • ...and 7 more figures

Theorems & Definitions (1)

  • Definition 1: Proximal Operator