Table of Contents
Fetching ...

SSFL: Discovering Sparse Unified Subnetworks at Initialization for Efficient Federated Learning

Riyasat Ohib, Bishal Thapaliya, Gintare Karolina Dziugaite, Jingyu Liu, Vince Calhoun, Sergey Plis

TL;DR

Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient communication that consistently improves the accuracy sparsity trade off, achieving more than 20\% relative error reduction on CIFAR-10 compared to the strongest sparse baseline.

Abstract

In this work, we propose Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient communication. SSFL identifies a sparse subnetwork prior to training, leveraging parameter saliency scores computed separately on local client data in non-IID scenarios, and then aggregated, to determine a global mask. Only the sparse model weights are trained and communicated each round between the clients and the server. On standard benchmarks including CIFAR-10, CIFAR-100, and Tiny-ImageNet, SSFL consistently improves the accuracy sparsity trade off, achieving more than 20\% relative error reduction on CIFAR-10 compared to the strongest sparse baseline, while reducing communication costs by $2 \times$ relative to dense FL. Finally, in a real-world federated learning deployment, SSFL delivers over $2.3 \times$ faster communication time, underscoring its practical efficiency.

SSFL: Discovering Sparse Unified Subnetworks at Initialization for Efficient Federated Learning

TL;DR

Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient communication that consistently improves the accuracy sparsity trade off, achieving more than 20\% relative error reduction on CIFAR-10 compared to the strongest sparse baseline.

Abstract

In this work, we propose Salient Sparse Federated Learning (SSFL), a streamlined approach for sparse federated learning with efficient communication. SSFL identifies a sparse subnetwork prior to training, leveraging parameter saliency scores computed separately on local client data in non-IID scenarios, and then aggregated, to determine a global mask. Only the sparse model weights are trained and communicated each round between the clients and the server. On standard benchmarks including CIFAR-10, CIFAR-100, and Tiny-ImageNet, SSFL consistently improves the accuracy sparsity trade off, achieving more than 20\% relative error reduction on CIFAR-10 compared to the strongest sparse baseline, while reducing communication costs by relative to dense FL. Finally, in a real-world federated learning deployment, SSFL delivers over faster communication time, underscoring its practical efficiency.
Paper Structure (63 sections, 14 equations, 8 figures, 6 tables, 2 algorithms)

This paper contains 63 sections, 14 equations, 8 figures, 6 tables, 2 algorithms.

Figures (8)

  • Figure 1: Illustration of the distributed connection importance in the non-IID setting. The parameter saliency scores from each site calculated on local minibatches of equal class distribution are aggregated, weighing them with the proportion of the data available at that site. The common mask generated from that score is applied to local client models.
  • Figure 2: Performance comparison at varying levels of sparsity for SSFL with similar sparse FL methods on ResNet18 on the (a) CIFAR-10 and (b) CIFAR-100 non-IID dataset.
  • Figure 3: Analysis of SSFL in different scenarios. (a) Effect of random intra-layer shuffling on SSFL masks. (b) Wall-time communication comparison between baseline dense communication and SSFL on CIFAR-10 across ResNet models of varying complexities.
  • Figure 4: (a-b) Mask Convergence. Mask error relative to the oracle decreases exponentially, stabilizing near $K \approx 80\text{--}100$ (vertical line). We use $K=100$ for all experiments. (c-d). Experiments done with 5 random seeds. Client Performance. Local accuracy (c, bars) plotted against aggregation weights $p_k$ (orange line), alongside sample counts (d). SSFL maintains high performance even on minimal data partitions (e.g., Clients 8 and 9).
  • Figure 5: OOD classes are introduced at round 225 (dotted vertical line). Following the single mask update, the model rapidly acquires the new concepts (Red curve), rising from 0% to over 80% accuracy, while maintaining stable global performance (Green curve).
  • ...and 3 more figures