SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead

Minsu Kim; Walid Saad; Merouane Debbah; Choong Seon Hong

SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead

Minsu Kim, Walid Saad, Merouane Debbah, Choong Seon Hong

TL;DR

SpaFL tackles the high communication and computation costs of federated learning by learning structured sparsity through per-filter/per-neuron trainable thresholds. Only thresholds are communicated, while local parameters remain on devices, allowing personalized sparse models and global thresholds to reflect aggregated parameter importance. The approach is supported by a theoretical generalization bound showing improved performance with increased sparsity, and empirical results demonstrate higher accuracy with substantially lower communication and FLOPs than dense or other sparse baselines, including applicability to ViT architectures. Overall, SpaFL offers a scalable, communication-efficient FL framework with practical impact for deploying learning on resource-constrained devices.

Abstract

The large communication and computation overhead of federated learning (FL) is one of the main challenges facing its practical deployment over resource-constrained clients and systems. In this work, SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. In SpaFL, a trainable threshold is defined for each filter/neuron to prune its all connected parameters, thereby leading to structured sparsity. To optimize the pruning process itself, only thresholds are communicated between a server and clients instead of parameters, thereby learning how to prune. Further, global thresholds are used to update model parameters by extracting aggregated parameter importance. The generalization bound of SpaFL is also derived, thereby proving key insights on the relation between sparsity and performance. Experimental results show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines. The code is available at https://github.com/news-vt/SpaFL_NeruIPS_2024

SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead

TL;DR

Abstract

Paper Structure (29 sections, 4 theorems, 46 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 29 sections, 4 theorems, 46 equations, 6 figures, 5 tables, 1 algorithm.

Introduction
Background and Related Work
Federated Learning
Training and Finding Sparse Models in FL
SpaFL Algorithm
Structured Pruning with Trainable Thresholds
Problem Formulation
Algorithm Overview
Local Training for Parameters and Thresholds
Learning Parameter Importance From Thresholds
Extracting Parameter Importance from Global Thresholds
Theoretical Analysis of SpaFL
Experiments
Experiments Configuration
Baselines
...and 14 more sections

Key Result

Theorem 1

For the loss function $||\mathcal{L}||_{\infty} \leq 1$, the training data size $D \geq \frac{2}{\epsilon'^2} \ln \left( \frac{16}{\exp(-\epsilon' \delta')} \right)$ and the total number of communication rounds $T$, we have where $\epsilon' = \sqrt{2T \log\frac{1}{\tilde{\delta}} \tilde{\epsilon}^2} + T \tilde{\epsilon} \frac{\exp(\tilde{\epsilon}) -1}{\exp(\tilde{\epsilon}) +1}$, where $\xi$ is

Figures (6)

Figure 1: Illustration of SpaFL framework that performs model pruning through thresholds. Only the thresholds are communicated between the server and clients.
Figure 2: Learning curves on FMNIST, CIFAR-10, and CIFAR-100
Figure 3: Sparsity pattern of conv1 layer on CIFAR-10
Figure 4: Sparsity patterns of conv2 layer on CIAFR-10
Figure 5: Sparsity patterns of dense1 layer on CIAFR-10
...and 1 more figures

Theorems & Definitions (7)

Theorem 1
Definition 1
Lemma 1
Lemma 2
proof
Theorem 2
proof

SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead

TL;DR

Abstract

SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (7)