Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning

Hongyao Chen; Tianyang Xu; Xiaojun Wu; Josef Kittler

Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning

Hongyao Chen, Tianyang Xu, Xiaojun Wu, Josef Kittler

TL;DR

Federated learning with non-IID data undermines Batch Normalisation due to biased global statistics. The authors introduce Hybrid Batch Normalisation (HBN), which decouples the update of BN statistics from learnable parameters and uses a learnable per-channel α to blend local batch statistics with unbiased global statistics computed at the server. Key contributions include an asynchronous, unbiased global-statistics aggregation, a per-client HBN normalisation with α, and extensive experiments showing robustness to heterogeneity and small batch sizes across common FL architectures. HBN acts as a practical plug-in that improves FL performance with modest communication and computation overhead, applicable across diverse networks and datasets.

Abstract

Batch Normalisation (BN) is widely used in conventional deep neural network training to harmonise the input-output distributions for each batch of data. However, federated learning, a distributed learning paradigm, faces the challenge of dealing with non-independent and identically distributed data among the client nodes. Due to the lack of a coherent methodology for updating BN statistical parameters, standard BN degrades the federated learning performance. To this end, it is urgent to explore an alternative normalisation solution for federated learning. In this work, we resolve the dilemma of the BN layer in federated learning by developing a customised normalisation approach, Hybrid Batch Normalisation (HBN). HBN separates the update of statistical parameters (i.e. , means and variances used for evaluation) from that of learnable parameters (i.e. , parameters that require gradient updates), obtaining unbiased estimates of global statistical parameters in distributed scenarios. In contrast with the existing solutions, we emphasise the supportive power of global statistics for federated learning. The HBN layer introduces a learnable hybrid distribution factor, allowing each computing node to adaptively mix the statistical parameters of the current batch with the global statistics. Our HBN can serve as a powerful plugin to advance federated learning performance. It reflects promising merits across a wide range of federated learning settings, especially for small batch sizes and heterogeneous data.

Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning

TL;DR

Abstract

Paper Structure (22 sections, 16 equations, 7 figures, 10 tables, 1 algorithm)

This paper contains 22 sections, 16 equations, 7 figures, 10 tables, 1 algorithm.

Introduction
Related Work
Approach
Formulation
Global Server End: Obtaining Unbiased Statistical Parameters
Local Client End: Performing Hybrid Batch Normalisation
Implementation
Experiments
Implementation Details
Performance Comparison
Ablation Study
Discussion
Conclusion
Problem Formulation
Statistical Bias of Vanilla BN in FL
...and 7 more sections

Figures (7)

Figure 1: A comparison of our Hybrid Batch Normalisation (HBN) with standard Batch Normalisation (BN) and Group Normalisation (GN) in the federated learning settings. (a) Classification error rate of Simple-CNN on CIFAR-100 vs Data heterogeneity (controlled by a Dirichlet distribution coefficient). The batch size is $16$. (b) Classification error rate of Simple-CNN on CIFAR-100 vs Batch size. The Dirichlet distribution coefficient is $0.6$. For other implementation details, please refer to \ref{['4.1']}.
Figure 2: Normalisation methods for two toy FL clusters.
Figure 3: Different Dirichlet coefficients $\phi (0.6,0.1)$ to label distribution on CIFAR-10 with 100 clients.
Figure 4: The sensitivity of different normalisation methods to batch size on CIFAR-100 with $\phi = 0.6$ by Simple-CNN.
Figure 5: Impact of the hyper-parameter $\lambda$ under different client activation ratios $C$.
...and 2 more figures

Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning

TL;DR

Abstract

Hybrid Batch Normalisation: Resolving the Dilemma of Batch Normalisation in Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)