Table of Contents
Fetching ...

One-Shot Federated Learning with Bayesian Pseudocoresets

Tim d'Hondt, Mykola Pechenizkiy, Robert Peharz

TL;DR

The paper addresses prohibitive communication costs in federated learning by proposing a one-shot Bayesian FL framework that aggregates local posteriors in function space. It leverages inducing-point–style Bayesian pseudocoresets (BPCs) to compress each client's information into a small set of function-space summaries, which are sent to the server and combined into a global posterior over inducing points, enabling downstream predictions with uncertainty estimates. Empirically, the method, BPC-FL, achieves competitive accuracy and log-likelihood while reducing communication by up to two orders of magnitude, and provides well-calibrated uncertainty; it can also serve as an initialization to speed up traditional Federated Optimization. The work demonstrates the practicality of function-space Bayesian FL and lays out a scalable, one-shot alternative to gradient-based FL with potential privacy and downstream inference benefits.

Abstract

Optimization-based techniques for federated learning (FL) often come with prohibitive communication cost, as high dimensional model parameters need to be communicated repeatedly between server and clients. In this paper, we follow a Bayesian approach allowing to perform FL with one-shot communication, by solving the global inference problem as a product of local client posteriors. For models with multi-modal likelihoods, such as neural networks, a naive application of this scheme is hampered, since clients will capture different posterior modes, causing a destructive collapse of the posterior on the server side. Consequently, we explore approximate inference in the function-space representation of client posteriors, hence suffering less or not at all from multi-modality. We show that distributed function-space inference is tightly related to learning Bayesian pseudocoresets and develop a tractable Bayesian FL algorithm on this insight. We show that this approach achieves prediction performance competitive to state-of-the-art while showing a striking reduction in communication cost of up to two orders of magnitude. Moreover, due to its Bayesian nature, our method also delivers well-calibrated uncertainty estimates.

One-Shot Federated Learning with Bayesian Pseudocoresets

TL;DR

The paper addresses prohibitive communication costs in federated learning by proposing a one-shot Bayesian FL framework that aggregates local posteriors in function space. It leverages inducing-point–style Bayesian pseudocoresets (BPCs) to compress each client's information into a small set of function-space summaries, which are sent to the server and combined into a global posterior over inducing points, enabling downstream predictions with uncertainty estimates. Empirically, the method, BPC-FL, achieves competitive accuracy and log-likelihood while reducing communication by up to two orders of magnitude, and provides well-calibrated uncertainty; it can also serve as an initialization to speed up traditional Federated Optimization. The work demonstrates the practicality of function-space Bayesian FL and lays out a scalable, one-shot alternative to gradient-based FL with potential privacy and downstream inference benefits.

Abstract

Optimization-based techniques for federated learning (FL) often come with prohibitive communication cost, as high dimensional model parameters need to be communicated repeatedly between server and clients. In this paper, we follow a Bayesian approach allowing to perform FL with one-shot communication, by solving the global inference problem as a product of local client posteriors. For models with multi-modal likelihoods, such as neural networks, a naive application of this scheme is hampered, since clients will capture different posterior modes, causing a destructive collapse of the posterior on the server side. Consequently, we explore approximate inference in the function-space representation of client posteriors, hence suffering less or not at all from multi-modality. We show that distributed function-space inference is tightly related to learning Bayesian pseudocoresets and develop a tractable Bayesian FL algorithm on this insight. We show that this approach achieves prediction performance competitive to state-of-the-art while showing a striking reduction in communication cost of up to two orders of magnitude. Moreover, due to its Bayesian nature, our method also delivers well-calibrated uncertainty estimates.
Paper Structure (28 sections, 13 equations, 5 figures, 9 tables, 2 algorithms)

This paper contains 28 sections, 13 equations, 5 figures, 9 tables, 2 algorithms.

Figures (5)

  • Figure 1: Results on EMNIST-62. Left column shows accuracy (y-axis) vs. communication in float32s (x-axis). Right column shows negative log likelihood (y-axis) vs communication in float32s (x-axis) Upper row shows baselines with 10 local steps, bottom row with 100 local steps. Note our one-shot approach (black$\bullet$) in the upper left corner and our one-shot approach as FedAvg initialization in red.
  • Figure 2: Synthetic regression: client data is visualized in a different color for each client. Upper row presents the produced server models by FedAvg (left) and DOSFL (right). Lower row shows ${\bm{Z}}$ as black crosses. The purple line left shows a model learned by running Adam on $q({\bm{\theta}}|\mathcal{C})$, whereas the purple line and area show the mean and one standard deviation values for the Bayesian predictive distribution estimated by HMC samples from $q({\bm{\theta}}|\mathcal{C})$.
  • Figure 3: 2d classification: red-blue filling shows each method's predictive distribution. On the bottom row, the crosses represent ${\bm{Z}}$ and their color represents $\hat{{\bm{y}}}_m$.
  • Figure 4: Synthetic regression - small datasets
  • Figure 5: Some example x-values from the server set in EMNIST-62