FedPeWS: Personalized Warmup via Subnetworks for Enhanced Heterogeneous Federated Learning

Nurbek Tastan; Samuel Horvath; Martin Takac; Karthik Nandakumar

FedPeWS: Personalized Warmup via Subnetworks for Enhanced Heterogeneous Federated Learning

Nurbek Tastan, Samuel Horvath, Martin Takac, Karthik Nandakumar

TL;DR

FedPeWS tackles extreme non-iid challenges in federated learning by introducing a personalized warmup that trains subnetworks via learnable neuron-level masks during early rounds, then switches to standard full-parameter federated optimization. The method identifies masks through gradient-based learning with a mask-generation function and a diversity term, and can operate in a fixed-subnetwork variant when data distributions are known. Across synthetic, image, and medical datasets, FedPeWS consistently reduces the number of communication rounds to reach target accuracy and improves final performance, with robustness to the hyperparameters governing mask learning and warmup length. This approach provides a practical, plug-and-play enhancement to existing FL optimizers, offering faster convergence and better generalization in highly heterogeneous cross-silo settings.

Abstract

Statistical data heterogeneity is a significant barrier to convergence in federated learning (FL). While prior work has advanced heterogeneous FL through better optimization objectives, these methods fall short when there is extreme data heterogeneity among collaborating participants. We hypothesize that convergence under extreme data heterogeneity is primarily hindered due to the aggregation of conflicting updates from the participants in the initial collaboration rounds. To overcome this problem, we propose a warmup phase where each participant learns a personalized mask and updates only a subnetwork of the full model. This personalized warmup allows the participants to focus initially on learning specific subnetworks tailored to the heterogeneity of their data. After the warmup phase, the participants revert to standard federated optimization, where all parameters are communicated. We empirically demonstrate that the proposed personalized warmup via subnetworks (FedPeWS) approach improves accuracy and convergence speed over standard federated optimization methods.

FedPeWS: Personalized Warmup via Subnetworks for Enhanced Heterogeneous Federated Learning

TL;DR

Abstract

Paper Structure (38 sections, 8 equations, 15 figures, 8 tables, 1 algorithm)

This paper contains 38 sections, 8 equations, 15 figures, 8 tables, 1 algorithm.

Introduction
Related Work
Collaborative Learning.
Independent Subnet Training.
Finding Subnetworks in FL.
Mixture of Experts.
Preliminaries
Federated Averaging (FedAvg).
Proposed FedPeWS Method
Identification of personalized subnetworks:
Use of fixed subnetworks:
Experiments and Results
Datasets and Network Architecture
Synthetic Dataset.
CIFAR-MNIST.
...and 23 more sections

Figures (15)

Figure 1: Conceptual illustration of training personalized subnetworks in federated learning.
Figure 2: Illustration of the proposed FedPeWS algorithm for two participants, which aggregates partial subnetworks ($x_i^t \odot m_i^t$) during the warmup phase to obtain a shared global model $x_g^t$. Here, $x_i^t$ and $m_i^t$ denote the local model and personalized mask of the $i^{\text{th}}$ participant in the $t^{\text{th}}$ round.
Figure 3: Samples from the custom synthetic dataset.
Figure 4: Results on Synthetic-{32, 3.2}K datasets with batch sizes {32, 8}, global learning rates $\eta_g \in \{1.0, 0.5, 0.25, 0.1\}$ and communication rounds $T \in \{200, 250, 400, 500\}$. See Table \ref{['table: convergence-table']} for details. FedPeWS consistently converges faster and outperforms FedAvg.
Figure 5: Visualization of validation accuracy and loss on the Synthetic-32K dataset with $N=4$.
...and 10 more figures

FedPeWS: Personalized Warmup via Subnetworks for Enhanced Heterogeneous Federated Learning

TL;DR

Abstract

FedPeWS: Personalized Warmup via Subnetworks for Enhanced Heterogeneous Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (15)