FedWon: Triumphing Multi-domain Federated Learning Without Normalization

Weiming Zhuang; Lingjuan Lyu

FedWon: Triumphing Multi-domain Federated Learning Without Normalization

Weiming Zhuang, Lingjuan Lyu

TL;DR

This paper tackles multi-domain federated learning (FL), where client data come from different domains and batch normalization (BN) statistics diverge across clients, hindering convergence. It proposes FedWon, a normalization-free FL approach that removes BN layers and reparameterizes convolutional layers with Scaled Weight Standardization (WSConv), enabling a BN-free yet stable training regime under FedAvg-style aggregation. Across five datasets and multiple architectures, FedWon consistently outperforms FedAvg and the state-of-the-art FedBN, with accuracy gains exceeding 10% on some domains and robust performance even at batch size 1, applicable to both cross-silo and cross-device FL and capable of addressing skewed label distributions. The work also includes comprehensive ablations and demonstrates FedWon’s domain generalization, feature-map alignment, and applicability to medical imaging, underscoring its practical impact for privacy-preserving, distributed learning in heterogeneous environments.

Abstract

Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients. Nevertheless, FL encounters challenges due to non-independent and identically distributed (non-i.i.d) data, leading to potential performance degradation and hindered convergence. While prior studies predominantly addressed the issue of skewed label distribution, our research addresses a crucial yet frequently overlooked problem known as multi-domain FL. In this scenario, clients' data originate from diverse domains with distinct feature distributions, instead of label distributions. To address the multi-domain problem in FL, we propose a novel method called Federated learning Without normalizations (FedWon). FedWon draws inspiration from the observation that batch normalization (BN) faces challenges in effectively modeling the statistics of multiple domains, while existing normalization techniques possess their own limitations. In order to address these issues, FedWon eliminates the normalization layers in FL and reparameterizes convolution layers with scaled weight standardization. Through extensive experimentation on five datasets and five models, our comprehensive experimental results demonstrate that FedWon surpasses both FedAvg and the current state-of-the-art method (FedBN) across all experimental setups, achieving notable accuracy improvements of more than 10% in certain domains. Furthermore, FedWon is versatile for both cross-silo and cross-device FL, exhibiting robust domain generalization capability, showcasing strong performance even with a batch size as small as 1, thereby catering to resource-constrained devices. Additionally, FedWon can also effectively tackle the challenge of skewed label distribution.

FedWon: Triumphing Multi-domain Federated Learning Without Normalization

TL;DR

Abstract

Paper Structure (25 sections, 4 equations, 14 figures, 22 tables)

This paper contains 25 sections, 4 equations, 14 figures, 22 tables.

Introduction
Preliminary
Federated Learning with Batch Normalization
Alternative Normalization Methods
Normalization-free Networks
Federated Learning Without Normalization
Problem Setup
Normalization-Free Federated Learning
Experiments on Multi-domain FL
Experiment Setup
Datasets.
Implementation Details.
Performance Evaluation
Ablation Studies
Experiments on Skewed Label Distribution
...and 10 more sections

Figures (14)

Figure 1: (a) We consider multi-domain federated learning, where each client contains data of one domain. This setting is highly practical and applicable in real-world scenarios. For example, autonomous cars in distinct locations capture images in varying weather conditions. (b) Visualization of batch normalization (BN) channel-wise statistics from two clients, each with data from a single domain. The upper and lower figures are results from the 4-th and 5-th BN layers of a 6-layer CNN, respectively. It highlights different feature statistics of BN layers trained on different domains.
Figure 2: Illustration of three FL algorithms: (a) FedAvg aggregates both convolution (Conv) layers and batch normalization (BN) layers in the server; (b) FedBN keeps BN layers in clients and only aggregates Conv layers; (c) Our proposed Federated learning Without normalizations (FedWon) removes all BN layers and reparameterizes Conv layers with scaled weight standardization (WSConv).
Figure 3: FedAvg without (w/o) BN yields inferior results.
Figure 4: Testing accuracy comparison of FedWon and FedAvg on Digits-Five dataset. Left: comparison of performance using small batch sizes B ={1, 2}, where 10 out of 100 clients are randomly selected to train in each round. Right: comparison of testing accuracy over the course of training with randomly selected 10 out of a total of 100 clients and batch size B = 2.
Figure 5: Analysis of feature maps with the Caltech-10 dataset. Top: visualization of feature maps of the last convolution layer. Bottom: comparison on average cosine similarity of feature maps between client (C$\leftrightarrow$C), and between a client and server (S$\leftrightarrow$C).
...and 9 more figures

FedWon: Triumphing Multi-domain Federated Learning Without Normalization

TL;DR

Abstract

FedWon: Triumphing Multi-domain Federated Learning Without Normalization

Authors

TL;DR

Abstract

Table of Contents

Figures (14)