Table of Contents
Fetching ...

Differentially Private and Federated Structure Learning in Bayesian Networks

Ghita Fassy El Fehri, Aurélien Bellet, Philippe Bastien

TL;DR

The paper tackles learning Bayesian network structures from distributed data under stringent privacy and communication constraints.It introduces Fed-Sparse-BNSL, a sparsity-driven federated approach, and its differentially private variant DP-Fed-Sparse-BNSL, both leveraging Proximal Greedy Coordinate Descent to maintain identifiability while exchanging only sparse edge updates.A careful DP-PGCD design using the exponential mechanism for coordinate selection and Gaussian gradient perturbation, analyzed under zCDP, yields strong privacy-utility trade-offs, especially in high dimensions.Extensive experiments on synthetic and real data show substantial communication reductions, competitive structure recovery, and effective participant-level personalization, with clear privacy-utility behavior as the budget changes.

Abstract

Learning the structure of a Bayesian network from decentralized data poses two major challenges: (i) ensuring rigorous privacy guarantees for participants, and (ii) avoiding communication costs that scale poorly with dimensionality. In this work, we introduce Fed-Sparse-BNSL, a novel federated method for learning linear Gaussian Bayesian network structures that addresses both challenges. By combining differential privacy with greedy updates that target only a few relevant edges per participant, Fed-Sparse-BNSL efficiently uses the privacy budget while keeping communication costs low. Our careful algorithmic design preserves model identifiability and enables accurate structure estimation. Experiments on synthetic and real datasets demonstrate that Fed-Sparse-BNSL achieves utility close to non-private baselines while offering substantially stronger privacy and communication efficiency.

Differentially Private and Federated Structure Learning in Bayesian Networks

TL;DR

The paper tackles learning Bayesian network structures from distributed data under stringent privacy and communication constraints.It introduces Fed-Sparse-BNSL, a sparsity-driven federated approach, and its differentially private variant DP-Fed-Sparse-BNSL, both leveraging Proximal Greedy Coordinate Descent to maintain identifiability while exchanging only sparse edge updates.A careful DP-PGCD design using the exponential mechanism for coordinate selection and Gaussian gradient perturbation, analyzed under zCDP, yields strong privacy-utility trade-offs, especially in high dimensions.Extensive experiments on synthetic and real data show substantial communication reductions, competitive structure recovery, and effective participant-level personalization, with clear privacy-utility behavior as the budget changes.

Abstract

Learning the structure of a Bayesian network from decentralized data poses two major challenges: (i) ensuring rigorous privacy guarantees for participants, and (ii) avoiding communication costs that scale poorly with dimensionality. In this work, we introduce Fed-Sparse-BNSL, a novel federated method for learning linear Gaussian Bayesian network structures that addresses both challenges. By combining differential privacy with greedy updates that target only a few relevant edges per participant, Fed-Sparse-BNSL efficiently uses the privacy budget while keeping communication costs low. Our careful algorithmic design preserves model identifiability and enables accurate structure estimation. Experiments on synthetic and real datasets demonstrate that Fed-Sparse-BNSL achieves utility close to non-private baselines while offering substantially stronger privacy and communication efficiency.

Paper Structure

This paper contains 51 sections, 6 theorems, 40 equations, 4 figures, 9 tables, 2 algorithms.

Key Result

Theorem 5.1

Let $\varepsilon, \delta > 0$ and $\Delta=2L_{i,j}/n_p$ where $L_{i,j}$ is the coordinate-wise Lipschitz constant of $\mathcal{L}$. Suppose DP-Fed-Sparse-BNSL (Algorithm algo:fed-BNSL) runs for $T$ global rounds, with local updates performed with $K$ iterations of DP-PGCD (Algorithm algo:DP-PGCD). I then DP-Fed-Sparse-BNSL is $(\varepsilon, \delta)$-differentially private with respect to each part

Figures (4)

  • Figure 1: Convergence of Fed-Sparse-BNSL and Fed-BNSL. Top: homogeneous synthetic data; Bottom: heterogeneous synthetic data. We report SHD, TPR and FDR across iterations.
  • Figure 2: Participant-level personalization: per-participant normalized MSE between true and estimated parameters, with and without personalization.
  • Figure 3: Privacy-utility trade-off: SHD, TPR and FDR of DP-Fed-Sparse-BNSL under varying privacy budgets $\varepsilon$, compared to non-private Fed-Sparse-BNSL.
  • Figure 4: Dimensional robustness: performance of DP-Fed-Sparse-BNSL vs DP-Fed-BNSL as dimension $d$ increases, under fixed privacy budget $\varepsilon=10$.

Theorems & Definitions (11)

  • Definition 2.1: Differential privacy
  • Theorem 5.1: Privacy of DP-Fed-Sparse-BNSL
  • proof : Sketch of proof
  • Definition A.1: cdp
  • Theorem A.1: cdp
  • Theorem A.2: cdp
  • Theorem A.3: cdp
  • Theorem A.4: exp-zcdp
  • proof
  • Theorem D.1: Privacy of smoothness constants
  • ...and 1 more