Table of Contents
Fetching ...

FedFisher: Leveraging Fisher Information for One-Shot Federated Learning

Divyansh Jhunjhunwala, Shiqiang Wang, Gauri Joshi

TL;DR

This work theoretically analyze FedFisher for two-layer over-parameterized ReLU neural networks and shows that the error of the one-shot FedFisher global model becomes vanishingly small as the width of the neural networks and amount of local training at clients increases.

Abstract

Standard federated learning (FL) algorithms typically require multiple rounds of communication between the server and the clients, which has several drawbacks, including requiring constant network connectivity, repeated investment of computational resources, and susceptibility to privacy attacks. One-Shot FL is a new paradigm that aims to address this challenge by enabling the server to train a global model in a single round of communication. In this work, we present FedFisher, a novel algorithm for one-shot FL that makes use of Fisher information matrices computed on local client models, motivated by a Bayesian perspective of FL. First, we theoretically analyze FedFisher for two-layer over-parameterized ReLU neural networks and show that the error of our one-shot FedFisher global model becomes vanishingly small as the width of the neural networks and amount of local training at clients increases. Next, we propose practical variants of FedFisher using the diagonal Fisher and K-FAC approximation for the full Fisher and highlight their communication and compute efficiency for FL. Finally, we conduct extensive experiments on various datasets, which show that these variants of FedFisher consistently improve over competing baselines.

FedFisher: Leveraging Fisher Information for One-Shot Federated Learning

TL;DR

This work theoretically analyze FedFisher for two-layer over-parameterized ReLU neural networks and shows that the error of the one-shot FedFisher global model becomes vanishingly small as the width of the neural networks and amount of local training at clients increases.

Abstract

Standard federated learning (FL) algorithms typically require multiple rounds of communication between the server and the clients, which has several drawbacks, including requiring constant network connectivity, repeated investment of computational resources, and susceptibility to privacy attacks. One-Shot FL is a new paradigm that aims to address this challenge by enabling the server to train a global model in a single round of communication. In this work, we present FedFisher, a novel algorithm for one-shot FL that makes use of Fisher information matrices computed on local client models, motivated by a Bayesian perspective of FL. First, we theoretically analyze FedFisher for two-layer over-parameterized ReLU neural networks and show that the error of our one-shot FedFisher global model becomes vanishingly small as the width of the neural networks and amount of local training at clients increases. Next, we propose practical variants of FedFisher using the diagonal Fisher and K-FAC approximation for the full Fisher and highlight their communication and compute efficiency for FL. Finally, we conduct extensive experiments on various datasets, which show that these variants of FedFisher consistently improve over competing baselines.
Paper Structure (62 sections, 21 theorems, 90 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 62 sections, 21 theorems, 90 equations, 4 figures, 6 tables, 1 algorithm.

Key Result

Proposition 1

(Global Posterior Decomposition al2020federated) Under the flat prior $\mathbb{P}(\bm{W}) \propto 1$, the global posterior decomposes into a product of local posteriors, i.e., $\mathbb{P}(\bm{W}|\mathcal{D}) \propto \prod_{i=1}^M \mathbb{P}(\bm{W}|\mathcal{D}_i)$. Furthermore, the modes of the globa

Figures (4)

  • Figure 1: Empirical validation of \ref{['theorem:fisher_avg_error']} in a synthetic setting. For a fixed number of local steps (\ref{['fig:error_vs_width']}), the error for FedFisher decreases as the width of the model increases. For a fixed width (\ref{['fig:error_vs_local_steps']}), the error first decreases and then increases as local steps increases.
  • Figure 2: Results of performing $5$ rounds of local training and aggregation across different datasets for $\alpha = 0.1$ and $M = 5$. FedFisher variants offer additional utility in multi-round settings and continue to improve over baselines.
  • Figure 3: Test accuracy results on different datasets by keeping $\alpha = 0.3$ fixed and varying number of clients $M$. FedFisher variants, especially FedFisher(K-FAC), consistently outperforms other baselines.
  • Figure 4: Reconstructed images when (a) server has access to the local model at first client and (b) server has access to the global model and K-FAC information of first client. The goal is to generate images corresponding to digit $3$.

Theorems & Definitions (34)

  • Proposition 1
  • Proposition 2
  • Lemma 1
  • Definition 1
  • Theorem 1
  • Corollary 1
  • Proposition 1
  • Proposition 2
  • Lemma 1
  • Definition 2
  • ...and 24 more