Overcoming the Challenges of Batch Normalization in Federated Learning

Rachid Guerraoui; Rafael Pinot; Geovani Rizk; John Stephan; François Taiani

Overcoming the Challenges of Batch Normalization in Federated Learning

Rachid Guerraoui, Rafael Pinot, Geovani Rizk, John Stephan, François Taiani

TL;DR

This paper tackles the difficulties of applying BatchNorm in federated learning under heterogeneous data distributions. It introduces Federated BatchNorm (FBN), which uses shared running statistics and a bias-corrected update to align training-time normalization with the centralized setting, thereby preserving data separability and closely matching centralized BN performance. The authors demonstrate CIFAR-10 experiments across varying heterogeneity levels and show that FBN outperforms Naive BatchNorm and FixBN, effectively mitigating external covariate shift. They also establish robustness of FBN to Byzantine threats by integrating robust server-aggregation techniques, achieving resilience against several adversarial attacks and highlighting the practical viability of BN in distributed training.

Abstract

Batch normalization has proven to be a very beneficial mechanism to accelerate the training and improve the accuracy of deep neural networks in centralized environments. Yet, the scheme faces significant challenges in federated learning, especially under high data heterogeneity. Essentially, the main challenges arise from external covariate shifts and inconsistent statistics across clients. We introduce in this paper Federated BatchNorm (FBN), a novel scheme that restores the benefits of batch normalization in federated learning. Essentially, FBN ensures that the batch normalization during training is consistent with what would be achieved in a centralized execution, hence preserving the distribution of the data, and providing running statistics that accurately approximate the global statistics. FBN thereby reduces the external covariate shift and matches the evaluation performance of the centralized setting. We also show that, with a slight increase in complexity, we can robustify FBN to mitigate erroneous statistics and potentially adversarial attacks.

Overcoming the Challenges of Batch Normalization in Federated Learning

TL;DR

Abstract

Paper Structure (16 sections, 11 equations, 13 figures, 2 tables, 3 algorithms)

This paper contains 16 sections, 11 equations, 13 figures, 2 tables, 3 algorithms.

Introduction
Contributions
Related Work
Background on BatchNorm in the Centralized Setting
Bottlenecks of BatchNorm in Distributed Environments
Federated BatchNorm (FBN)
On the Robustness of Federated BatchNorm
Conclusion
Federated BatchNorm for iterative algorithms
Distributed Stochastic Gradient Descent With FBN
Additional Experimental results
Experimental setup
Additional experiments on FBN using Dirichlet Heterogeneity metric
Comprehensive Results on Robustness of FBN
Results with $\gamma$-similarity
...and 1 more sections

Figures (13)

Figure 1: Points normalized with BatchNorm in a centralized setting (a) and with Naive BatchNorm in a federated setting (b). In the latter, the points are distributed heterogeneously: each client has one different class that is represented by one color. In a centralized setting the data is properly normalized while the normalization fails with Naive BatchNorm ultimately making the points non-separable.
Figure 2: (a) and (b) show the points normalized by FBN and FixBN, respectively, in extreme heterogeneity. (c) shows the averaged distance between the normalized points using $\{$FBN, FixBN, Naive BatchNorm$\}$ in a federated setting and the normalized points using BatchNorm in the centralized setting for different heterogeneity regimes. For all schemes, we set $\beta = 0.1$.
Figure 3: Final accuracy of DSGD using $\{$FBN, FixBN, Naive BatchNorm$\}$ vs heterogeneity (a), evolution of test accuracy for $\gamma = 0.01$ (b) and in extreme heterogeneity ($\gamma=0$) (c).
Figure 4: Performance of FBN vs. Naive BatchNorm on CIFAR-10 with adversarial clients in two heterogeneity settings. Out of $n=10$ clients, we consider $f = 3$ adversarial clients executing SFallen2020byzantine. Naive BatchNorm is strongly impacted by the attack, even when protected through the median yin2018byzantine aggregation. By contrast, FBN (ours) successfully defeats the attack when using the same protection.
Figure 5: Final accuracy of DSGD using $\{$FBN, FixBN, Naive BatchNorm$\}$ vs heterogeneity (a), evolution of test accuracy for $\alpha = 0.1$ (b) and in extreme heterogeneity ($\gamma=0$) (c).
...and 8 more figures

Theorems & Definitions (1)

Remark 4.1

Overcoming the Challenges of Batch Normalization in Federated Learning

TL;DR

Abstract

Overcoming the Challenges of Batch Normalization in Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (13)

Theorems & Definitions (1)