Table of Contents
Fetching ...

SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead

Minsu Kim, Walid Saad, Merouane Debbah, Choong Seon Hong

TL;DR

SpaFL tackles the high communication and computation costs of federated learning by learning structured sparsity through per-filter/per-neuron trainable thresholds. Only thresholds are communicated, while local parameters remain on devices, allowing personalized sparse models and global thresholds to reflect aggregated parameter importance. The approach is supported by a theoretical generalization bound showing improved performance with increased sparsity, and empirical results demonstrate higher accuracy with substantially lower communication and FLOPs than dense or other sparse baselines, including applicability to ViT architectures. Overall, SpaFL offers a scalable, communication-efficient FL framework with practical impact for deploying learning on resource-constrained devices.

Abstract

The large communication and computation overhead of federated learning (FL) is one of the main challenges facing its practical deployment over resource-constrained clients and systems. In this work, SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. In SpaFL, a trainable threshold is defined for each filter/neuron to prune its all connected parameters, thereby leading to structured sparsity. To optimize the pruning process itself, only thresholds are communicated between a server and clients instead of parameters, thereby learning how to prune. Further, global thresholds are used to update model parameters by extracting aggregated parameter importance. The generalization bound of SpaFL is also derived, thereby proving key insights on the relation between sparsity and performance. Experimental results show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines. The code is available at https://github.com/news-vt/SpaFL_NeruIPS_2024

SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead

TL;DR

SpaFL tackles the high communication and computation costs of federated learning by learning structured sparsity through per-filter/per-neuron trainable thresholds. Only thresholds are communicated, while local parameters remain on devices, allowing personalized sparse models and global thresholds to reflect aggregated parameter importance. The approach is supported by a theoretical generalization bound showing improved performance with increased sparsity, and empirical results demonstrate higher accuracy with substantially lower communication and FLOPs than dense or other sparse baselines, including applicability to ViT architectures. Overall, SpaFL offers a scalable, communication-efficient FL framework with practical impact for deploying learning on resource-constrained devices.

Abstract

The large communication and computation overhead of federated learning (FL) is one of the main challenges facing its practical deployment over resource-constrained clients and systems. In this work, SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead. In SpaFL, a trainable threshold is defined for each filter/neuron to prune its all connected parameters, thereby leading to structured sparsity. To optimize the pruning process itself, only thresholds are communicated between a server and clients instead of parameters, thereby learning how to prune. Further, global thresholds are used to update model parameters by extracting aggregated parameter importance. The generalization bound of SpaFL is also derived, thereby proving key insights on the relation between sparsity and performance. Experimental results show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines. The code is available at https://github.com/news-vt/SpaFL_NeruIPS_2024
Paper Structure (29 sections, 4 theorems, 46 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 29 sections, 4 theorems, 46 equations, 6 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

For the loss function $||\mathcal{L}||_{\infty} \leq 1$, the training data size $D \geq \frac{2}{\epsilon'^2} \ln \left( \frac{16}{\exp(-\epsilon' \delta')} \right)$ and the total number of communication rounds $T$, we have where $\epsilon' = \sqrt{2T \log\frac{1}{\tilde{\delta}} \tilde{\epsilon}^2} + T \tilde{\epsilon} \frac{\exp(\tilde{\epsilon}) -1}{\exp(\tilde{\epsilon}) +1}$, where $\xi$ is

Figures (6)

  • Figure 1: Illustration of SpaFL framework that performs model pruning through thresholds. Only the thresholds are communicated between the server and clients.
  • Figure 2: Learning curves on FMNIST, CIFAR-10, and CIFAR-100
  • Figure 3: Sparsity pattern of conv1 layer on CIFAR-10
  • Figure 4: Sparsity patterns of conv2 layer on CIAFR-10
  • Figure 5: Sparsity patterns of dense1 layer on CIAFR-10
  • ...and 1 more figures

Theorems & Definitions (7)

  • Theorem 1
  • Definition 1
  • Lemma 1
  • Lemma 2
  • proof
  • Theorem 2
  • proof