Table of Contents
Fetching ...

FedWon: Triumphing Multi-domain Federated Learning Without Normalization

Weiming Zhuang, Lingjuan Lyu

TL;DR

This paper tackles multi-domain federated learning (FL), where client data come from different domains and batch normalization (BN) statistics diverge across clients, hindering convergence. It proposes FedWon, a normalization-free FL approach that removes BN layers and reparameterizes convolutional layers with Scaled Weight Standardization (WSConv), enabling a BN-free yet stable training regime under FedAvg-style aggregation. Across five datasets and multiple architectures, FedWon consistently outperforms FedAvg and the state-of-the-art FedBN, with accuracy gains exceeding 10% on some domains and robust performance even at batch size 1, applicable to both cross-silo and cross-device FL and capable of addressing skewed label distributions. The work also includes comprehensive ablations and demonstrates FedWon’s domain generalization, feature-map alignment, and applicability to medical imaging, underscoring its practical impact for privacy-preserving, distributed learning in heterogeneous environments.

Abstract

Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients. Nevertheless, FL encounters challenges due to non-independent and identically distributed (non-i.i.d) data, leading to potential performance degradation and hindered convergence. While prior studies predominantly addressed the issue of skewed label distribution, our research addresses a crucial yet frequently overlooked problem known as multi-domain FL. In this scenario, clients' data originate from diverse domains with distinct feature distributions, instead of label distributions. To address the multi-domain problem in FL, we propose a novel method called Federated learning Without normalizations (FedWon). FedWon draws inspiration from the observation that batch normalization (BN) faces challenges in effectively modeling the statistics of multiple domains, while existing normalization techniques possess their own limitations. In order to address these issues, FedWon eliminates the normalization layers in FL and reparameterizes convolution layers with scaled weight standardization. Through extensive experimentation on five datasets and five models, our comprehensive experimental results demonstrate that FedWon surpasses both FedAvg and the current state-of-the-art method (FedBN) across all experimental setups, achieving notable accuracy improvements of more than 10% in certain domains. Furthermore, FedWon is versatile for both cross-silo and cross-device FL, exhibiting robust domain generalization capability, showcasing strong performance even with a batch size as small as 1, thereby catering to resource-constrained devices. Additionally, FedWon can also effectively tackle the challenge of skewed label distribution.

FedWon: Triumphing Multi-domain Federated Learning Without Normalization

TL;DR

This paper tackles multi-domain federated learning (FL), where client data come from different domains and batch normalization (BN) statistics diverge across clients, hindering convergence. It proposes FedWon, a normalization-free FL approach that removes BN layers and reparameterizes convolutional layers with Scaled Weight Standardization (WSConv), enabling a BN-free yet stable training regime under FedAvg-style aggregation. Across five datasets and multiple architectures, FedWon consistently outperforms FedAvg and the state-of-the-art FedBN, with accuracy gains exceeding 10% on some domains and robust performance even at batch size 1, applicable to both cross-silo and cross-device FL and capable of addressing skewed label distributions. The work also includes comprehensive ablations and demonstrates FedWon’s domain generalization, feature-map alignment, and applicability to medical imaging, underscoring its practical impact for privacy-preserving, distributed learning in heterogeneous environments.

Abstract

Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients. Nevertheless, FL encounters challenges due to non-independent and identically distributed (non-i.i.d) data, leading to potential performance degradation and hindered convergence. While prior studies predominantly addressed the issue of skewed label distribution, our research addresses a crucial yet frequently overlooked problem known as multi-domain FL. In this scenario, clients' data originate from diverse domains with distinct feature distributions, instead of label distributions. To address the multi-domain problem in FL, we propose a novel method called Federated learning Without normalizations (FedWon). FedWon draws inspiration from the observation that batch normalization (BN) faces challenges in effectively modeling the statistics of multiple domains, while existing normalization techniques possess their own limitations. In order to address these issues, FedWon eliminates the normalization layers in FL and reparameterizes convolution layers with scaled weight standardization. Through extensive experimentation on five datasets and five models, our comprehensive experimental results demonstrate that FedWon surpasses both FedAvg and the current state-of-the-art method (FedBN) across all experimental setups, achieving notable accuracy improvements of more than 10% in certain domains. Furthermore, FedWon is versatile for both cross-silo and cross-device FL, exhibiting robust domain generalization capability, showcasing strong performance even with a batch size as small as 1, thereby catering to resource-constrained devices. Additionally, FedWon can also effectively tackle the challenge of skewed label distribution.
Paper Structure (25 sections, 4 equations, 14 figures, 22 tables)

This paper contains 25 sections, 4 equations, 14 figures, 22 tables.

Figures (14)

  • Figure 1: (a) We consider multi-domain federated learning, where each client contains data of one domain. This setting is highly practical and applicable in real-world scenarios. For example, autonomous cars in distinct locations capture images in varying weather conditions. (b) Visualization of batch normalization (BN) channel-wise statistics from two clients, each with data from a single domain. The upper and lower figures are results from the 4-th and 5-th BN layers of a 6-layer CNN, respectively. It highlights different feature statistics of BN layers trained on different domains.
  • Figure 2: Illustration of three FL algorithms: (a) FedAvg aggregates both convolution (Conv) layers and batch normalization (BN) layers in the server; (b) FedBN keeps BN layers in clients and only aggregates Conv layers; (c) Our proposed Federated learning Without normalizations (FedWon) removes all BN layers and reparameterizes Conv layers with scaled weight standardization (WSConv).
  • Figure 3: FedAvg without (w/o) BN yields inferior results.
  • Figure 4: Testing accuracy comparison of FedWon and FedAvg on Digits-Five dataset. Left: comparison of performance using small batch sizes B ={1, 2}, where 10 out of 100 clients are randomly selected to train in each round. Right: comparison of testing accuracy over the course of training with randomly selected 10 out of a total of 100 clients and batch size B = 2.
  • Figure 5: Analysis of feature maps with the Caltech-10 dataset. Top: visualization of feature maps of the last convolution layer. Bottom: comparison on average cosine similarity of feature maps between client (C$\leftrightarrow$C), and between a client and server (S$\leftrightarrow$C).
  • ...and 9 more figures