Table of Contents
Fetching ...

Information-Geometric Barycenters for Bayesian Federated Learning

Nour Jamoussi, Giuseppe Serra, Photios A. Stavrou, Marios Kountouris

TL;DR

This work reframes federated learning aggregation as computing a barycenter of local posteriors on a statistical manifold, enabling an information-geometric perspective that unifies existing Bayesian aggregation methods. It introduces BA-BFL, which preserves the convergence properties of FedAvg while allowing two analytically tractable Gaussian barycenters: the reverse KL (RKL) barycenter and the squared-Wasserstein (W2) barycenter, with explicit closed-form updates for diagonal covariances. The method is extended to Hybrid Bayesian Deep Learning, showing robustness across heterogeneous data and enabling tunable uncertainty quantification via Bayesian layers; experiments on FashionMNIST, SVHN, and CIFAR-10 demonstrate competitive accuracy and improved calibration, while analyzing the trade-offs between Bayesian complexity and computation. The work lays groundwork for broader divergence choices and nonparametric extensions, and suggests personalization via clustering on the posterior manifold to further enhance performance in distributed, heterogeneous environments.

Abstract

Federated learning (FL) is a widely used and impactful distributed optimization framework that achieves consensus through averaging locally trained models. While effective, this approach may not align well with Bayesian inference, where the model space has the structure of a distribution space. Taking an information-geometric perspective, we reinterpret FL aggregation as the problem of finding the barycenter of local posteriors using a prespecified divergence metric, minimizing the average discrepancy across clients. This perspective provides a unifying framework that generalizes many existing methods and offers crisp insights into their theoretical underpinnings. We then propose BA-BFL, an algorithm that retains the convergence properties of Federated Averaging in non-convex settings. In non-independent and identically distributed scenarios, we conduct extensive comparisons with statistical aggregation techniques, showing that BA-BFL achieves performance comparable to state-of-the-art methods while offering a geometric interpretation of the aggregation phase. Additionally, we extend our analysis to Hybrid Bayesian Deep Learning, exploring the impact of Bayesian layers on uncertainty quantification and model calibration.

Information-Geometric Barycenters for Bayesian Federated Learning

TL;DR

This work reframes federated learning aggregation as computing a barycenter of local posteriors on a statistical manifold, enabling an information-geometric perspective that unifies existing Bayesian aggregation methods. It introduces BA-BFL, which preserves the convergence properties of FedAvg while allowing two analytically tractable Gaussian barycenters: the reverse KL (RKL) barycenter and the squared-Wasserstein (W2) barycenter, with explicit closed-form updates for diagonal covariances. The method is extended to Hybrid Bayesian Deep Learning, showing robustness across heterogeneous data and enabling tunable uncertainty quantification via Bayesian layers; experiments on FashionMNIST, SVHN, and CIFAR-10 demonstrate competitive accuracy and improved calibration, while analyzing the trade-offs between Bayesian complexity and computation. The work lays groundwork for broader divergence choices and nonparametric extensions, and suggests personalization via clustering on the posterior manifold to further enhance performance in distributed, heterogeneous environments.

Abstract

Federated learning (FL) is a widely used and impactful distributed optimization framework that achieves consensus through averaging locally trained models. While effective, this approach may not align well with Bayesian inference, where the model space has the structure of a distribution space. Taking an information-geometric perspective, we reinterpret FL aggregation as the problem of finding the barycenter of local posteriors using a prespecified divergence metric, minimizing the average discrepancy across clients. This perspective provides a unifying framework that generalizes many existing methods and offers crisp insights into their theoretical underpinnings. We then propose BA-BFL, an algorithm that retains the convergence properties of Federated Averaging in non-convex settings. In non-independent and identically distributed scenarios, we conduct extensive comparisons with statistical aggregation techniques, showing that BA-BFL achieves performance comparable to state-of-the-art methods while offering a geometric interpretation of the aggregation phase. Additionally, we extend our analysis to Hybrid Bayesian Deep Learning, exploring the impact of Bayesian layers on uncertainty quantification and model calibration.

Paper Structure

This paper contains 15 sections, 1 theorem, 11 equations, 4 figures, 4 tables, 1 algorithm.

Key Result

Theorem 3.1

(Convergence) Under Assumption ass:gaussian_ind, and using either RKLB or WB aggregation, BA-BFL inherits and preserves the convergence properties of FedAvg, as shown in karimireddy2020scaffold, for non-convex scenarios with both i.i.d. and non-i.i.d. data.

Figures (4)

  • Figure 1: The BA-BFL framework.
  • Figure 2: Mapping of various aggregation methods to their corresponding barycenter formulations.
  • Figure 3: Bayesian signed-rank test triangle plots comparing aggregation methods on the NLL metric. Each subfigure shows the posterior distribution over the relative performance between two methods.
  • Figure 4: Effect of Bayesian Layers on Uncertainty Quantification and Model Calibration.

Theorems & Definitions (2)

  • Theorem 3.1
  • proof