FedHB: Hierarchical Bayesian Federated Learning

Minyoung Kim; Timothy Hospedales

FedHB: Hierarchical Bayesian Federated Learning

Minyoung Kim, Timothy Hospedales

TL;DR

This work proposes a novel hierarchical Bayesian approach to Federated Learning (FL), where the model reasonably describes the generative process of clients'local data via hierarchical Bayesian modeling: constituting random variables of local models for clients that are governed by a higher-level global variate.

Abstract

We propose a novel hierarchical Bayesian approach to Federated Learning (FL), where our model reasonably describes the generative process of clients' local data via hierarchical Bayesian modeling: constituting random variables of local models for clients that are governed by a higher-level global variate. Interestingly, the variational inference in our Bayesian model leads to an optimisation problem whose block-coordinate descent solution becomes a distributed algorithm that is separable over clients and allows them not to reveal their own private data at all, thus fully compatible with FL. We also highlight that our block-coordinate algorithm has particular forms that subsume the well-known FL algorithms including Fed-Avg and Fed-Prox as special cases. Beyond introducing novel modeling and derivations, we also offer convergence analysis showing that our block-coordinate FL algorithm converges to an (local) optimum of the objective at the rate of $O(1/\sqrt{t})$, the same rate as regular (centralised) SGD, as well as the generalisation error analysis where we prove that the test error of our model on unseen data is guaranteed to vanish as we increase the training data size, thus asymptotically optimal.

FedHB: Hierarchical Bayesian Federated Learning

TL;DR

Abstract

, the same rate as regular (centralised) SGD, as well as the generalisation error analysis where we prove that the test error of our model on unseen data is guaranteed to vanish as we increase the training data size, thus asymptotically optimal.

Paper Structure (37 sections, 4 theorems, 110 equations, 5 figures, 9 tables, 10 algorithms)

This paper contains 37 sections, 4 theorems, 110 equations, 5 figures, 9 tables, 10 algorithms.

Introduction
Related Work and Our Contributions
Related Work
Outline of Our Contributions
Bayesian FL: General Framework
From Variational Inference to Federated Learning Algorithm
Formalisation of Global Prediction and Personalisation Tasks
Bayesian FL: Two Concrete Models
Normal-Inverse-Wishart (NIW) Model
Mixture Model
Theoretical Analysis
Convergence Analysis
Generalisation Error Bound
Computational Complexity Analysis
Evaluation
...and 22 more sections

Key Result

Theorem 1

We denote the objective function in (eq:elbo) by $f(x)$ where $x = [x_0,x_1,\dots,x_N]$ corresponding to the variational parameters $x_0:=L_0$, $x_1:=L_1$, …, $x_N:=L_N$. Let $\eta_t = \overline{L} + \sqrt{t}$ for some constant $\overline{L}$, and $\overline{x}^T = \frac{1}{T}\sum_{t=1}^T x^t$, whe where $x^*$ is the (local) optimum, $D$, and $R_f$ are some constants, and the expectation is taken

Figures (5)

Figure 1: Graphical models. (a) Plate view of iid clients. (b) Individual client data with input images $x$ given and only $p(y|x)$ modeled. (c) $\&$ (d): Global prediction and personalisation as probabilistic inference problems (shaded nodes $=$evidences, red colored nodes $=$targets to infer, $x^*=$ test input in global prediction, $D^p=$ training data for personalisation and $x^p=$ test input).
Figure 2: Hyperparameter sensitivity analysis and comparison with simple ensemble baselines.
Figure 3: CIFAR-100 training dynamics. (Left) Training curves as FL rounds. (Right) Personalisation training curves. We also superimpose test accuracies.
Figure 4: MNIST training convergence with different numbers of participating clients. (Left) NIW and (Right) Mixture ($K=2$).
Figure 5: Comparison between our mixture model and ensemble baselines ($K$ varied) on CIFAR-100.

Theorems & Definitions (8)

Theorem 1: Convergence analysis
Remark
Theorem 2: Generalisation error bound
Remark
Remark
Lemma 3
Remark
Lemma 4: From the proof of Lemma 4.1 in bai20

FedHB: Hierarchical Bayesian Federated Learning

TL;DR

Abstract

FedHB: Hierarchical Bayesian Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (8)