Calibrated One Round Federated Learning with Bayesian Inference in the Predictive Space
Mohsin Hasan, Guojun Zhang, Kaiyang Guo, Xi Chen, Pascal Poupart
TL;DR
Federated Learning faces calibration challenges when client data are heterogeneous. The authors show that Bayesian Committee Machine (BCM) can be overconfident in aggregated predictions and introduce β-Predictive Bayes, which interpolates between the BCM product and a predictive mixture using a tunable parameter $β$, followed by distillation to a single deployable model. The method learns $β$ by optimizing a negative log-likelihood on a server dataset and demonstrates improved calibration (lower NLL and ECE) on both classification and regression tasks in a single communication round. The work provides theoretical calibration analysis and extensive empirical results, highlighting improved uncertainty estimates in FL with limited communication and heterogeneous data. This approach enables more reliable probabilistic predictions in practical FL deployments.
Abstract
Federated Learning (FL) involves training a model over a dataset distributed among clients, with the constraint that each client's dataset is localized and possibly heterogeneous. In FL, small and noisy datasets are common, highlighting the need for well-calibrated models that represent the uncertainty of predictions. The closest FL techniques to achieving such goals are the Bayesian FL methods which collect parameter samples from local posteriors, and aggregate them to approximate the global posterior. To improve scalability for larger models, one common Bayesian approach is to approximate the global predictive posterior by multiplying local predictive posteriors. In this work, we demonstrate that this method gives systematically overconfident predictions, and we remedy this by proposing $β$-Predictive Bayes, a Bayesian FL algorithm that interpolates between a mixture and product of the predictive posteriors, using a tunable parameter $β$. This parameter is tuned to improve the global ensemble's calibration, before it is distilled to a single model. Our method is evaluated on a variety of regression and classification datasets to demonstrate its superiority in calibration to other baselines, even as data heterogeneity increases. Code available at https://github.com/hasanmohsin/betaPredBayesFL
