Table of Contents
Fetching ...

Calibrated One Round Federated Learning with Bayesian Inference in the Predictive Space

Mohsin Hasan, Guojun Zhang, Kaiyang Guo, Xi Chen, Pascal Poupart

TL;DR

Federated Learning faces calibration challenges when client data are heterogeneous. The authors show that Bayesian Committee Machine (BCM) can be overconfident in aggregated predictions and introduce β-Predictive Bayes, which interpolates between the BCM product and a predictive mixture using a tunable parameter $β$, followed by distillation to a single deployable model. The method learns $β$ by optimizing a negative log-likelihood on a server dataset and demonstrates improved calibration (lower NLL and ECE) on both classification and regression tasks in a single communication round. The work provides theoretical calibration analysis and extensive empirical results, highlighting improved uncertainty estimates in FL with limited communication and heterogeneous data. This approach enables more reliable probabilistic predictions in practical FL deployments.

Abstract

Federated Learning (FL) involves training a model over a dataset distributed among clients, with the constraint that each client's dataset is localized and possibly heterogeneous. In FL, small and noisy datasets are common, highlighting the need for well-calibrated models that represent the uncertainty of predictions. The closest FL techniques to achieving such goals are the Bayesian FL methods which collect parameter samples from local posteriors, and aggregate them to approximate the global posterior. To improve scalability for larger models, one common Bayesian approach is to approximate the global predictive posterior by multiplying local predictive posteriors. In this work, we demonstrate that this method gives systematically overconfident predictions, and we remedy this by proposing $β$-Predictive Bayes, a Bayesian FL algorithm that interpolates between a mixture and product of the predictive posteriors, using a tunable parameter $β$. This parameter is tuned to improve the global ensemble's calibration, before it is distilled to a single model. Our method is evaluated on a variety of regression and classification datasets to demonstrate its superiority in calibration to other baselines, even as data heterogeneity increases. Code available at https://github.com/hasanmohsin/betaPredBayesFL

Calibrated One Round Federated Learning with Bayesian Inference in the Predictive Space

TL;DR

Federated Learning faces calibration challenges when client data are heterogeneous. The authors show that Bayesian Committee Machine (BCM) can be overconfident in aggregated predictions and introduce β-Predictive Bayes, which interpolates between the BCM product and a predictive mixture using a tunable parameter , followed by distillation to a single deployable model. The method learns by optimizing a negative log-likelihood on a server dataset and demonstrates improved calibration (lower NLL and ECE) on both classification and regression tasks in a single communication round. The work provides theoretical calibration analysis and extensive empirical results, highlighting improved uncertainty estimates in FL with limited communication and heterogeneous data. This approach enables more reliable probabilistic predictions in practical FL deployments.

Abstract

Federated Learning (FL) involves training a model over a dataset distributed among clients, with the constraint that each client's dataset is localized and possibly heterogeneous. In FL, small and noisy datasets are common, highlighting the need for well-calibrated models that represent the uncertainty of predictions. The closest FL techniques to achieving such goals are the Bayesian FL methods which collect parameter samples from local posteriors, and aggregate them to approximate the global posterior. To improve scalability for larger models, one common Bayesian approach is to approximate the global predictive posterior by multiplying local predictive posteriors. In this work, we demonstrate that this method gives systematically overconfident predictions, and we remedy this by proposing -Predictive Bayes, a Bayesian FL algorithm that interpolates between a mixture and product of the predictive posteriors, using a tunable parameter . This parameter is tuned to improve the global ensemble's calibration, before it is distilled to a single model. Our method is evaluated on a variety of regression and classification datasets to demonstrate its superiority in calibration to other baselines, even as data heterogeneity increases. Code available at https://github.com/hasanmohsin/betaPredBayesFL
Paper Structure (32 sections, 20 theorems, 28 equations, 2 figures, 6 tables, 1 algorithm)

This paper contains 32 sections, 20 theorems, 28 equations, 2 figures, 6 tables, 1 algorithm.

Key Result

Lemma 1

Assume $x^* \in R$. Under some mild conditions on the kernel function, and under the assumption of Gaussian or Laplacian observation noise, as the number of data-points increases $\sigma^2(x^*) \to \sigma^2_o$ (and in addition, the predictive mean converges to the true function value: $\mu(x^*) \to

Figures (2)

  • Figure 1: NLL on the classification datasets with increasing heterogeneity (tested with $h\in${0.0, 0.3, 0.6, 0.9}). Averages and standard error over 10 seeds are reported. Omitted values (e.g., for FedPA on EMNIST) denote results where NLL diverged.
  • Figure 2: ECE on the classification datasets with increasing heterogeneity (tested with $h\in${0.0, 0.3, 0.6, 0.9}). Averages and standard error over 10 seeds are reported.

Theorems & Definitions (29)

  • Lemma 1: choiGPconsistency
  • Lemma 2
  • Theorem 1: BCM, homogeneous
  • Theorem 2: BCM, heterogeneous
  • Theorem 3: mixture model, homogeneous
  • Theorem 4: mixture model, heterogeneous
  • Theorem 5: BCM, homogeneous, classification
  • Theorem 6: BCM, heterogeneous, classification
  • Theorem 7: mixture model, heterogeneous, classification
  • Theorem 8: mixture model, homogeneous, classification
  • ...and 19 more