FedWon: Triumphing Multi-domain Federated Learning Without Normalization
Weiming Zhuang, Lingjuan Lyu
TL;DR
This paper tackles multi-domain federated learning (FL), where client data come from different domains and batch normalization (BN) statistics diverge across clients, hindering convergence. It proposes FedWon, a normalization-free FL approach that removes BN layers and reparameterizes convolutional layers with Scaled Weight Standardization (WSConv), enabling a BN-free yet stable training regime under FedAvg-style aggregation. Across five datasets and multiple architectures, FedWon consistently outperforms FedAvg and the state-of-the-art FedBN, with accuracy gains exceeding 10% on some domains and robust performance even at batch size 1, applicable to both cross-silo and cross-device FL and capable of addressing skewed label distributions. The work also includes comprehensive ablations and demonstrates FedWon’s domain generalization, feature-map alignment, and applicability to medical imaging, underscoring its practical impact for privacy-preserving, distributed learning in heterogeneous environments.
Abstract
Federated learning (FL) enhances data privacy with collaborative in-situ training on decentralized clients. Nevertheless, FL encounters challenges due to non-independent and identically distributed (non-i.i.d) data, leading to potential performance degradation and hindered convergence. While prior studies predominantly addressed the issue of skewed label distribution, our research addresses a crucial yet frequently overlooked problem known as multi-domain FL. In this scenario, clients' data originate from diverse domains with distinct feature distributions, instead of label distributions. To address the multi-domain problem in FL, we propose a novel method called Federated learning Without normalizations (FedWon). FedWon draws inspiration from the observation that batch normalization (BN) faces challenges in effectively modeling the statistics of multiple domains, while existing normalization techniques possess their own limitations. In order to address these issues, FedWon eliminates the normalization layers in FL and reparameterizes convolution layers with scaled weight standardization. Through extensive experimentation on five datasets and five models, our comprehensive experimental results demonstrate that FedWon surpasses both FedAvg and the current state-of-the-art method (FedBN) across all experimental setups, achieving notable accuracy improvements of more than 10% in certain domains. Furthermore, FedWon is versatile for both cross-silo and cross-device FL, exhibiting robust domain generalization capability, showcasing strong performance even with a batch size as small as 1, thereby catering to resource-constrained devices. Additionally, FedWon can also effectively tackle the challenge of skewed label distribution.
