Table of Contents
Fetching ...

Communication-Efficient Federated Learning via Regularized Sparse Random Networks

Mohamad Mestoukirdi, Omid Esrafilian, David Gesbert, Qianrui Li, Nicolas Gresset

TL;DR

This paper tackles communication bottlenecks in Federated Learning by training over-parameterized random networks and exchanging sparse binary masks (up to $1$ bit per parameter). It identifies that existing stochastic masking approaches do not reliably produce sparse sub-networks under consistent objectives and introduces a regularized loss that penalizes mask entropy to promote sparsity while preserving generalization. The authors formalize the objective $\bar{F}(\boldsymbol{m}) = \frac{1}{\sum_i |\mathcal{D}_i|} \sum_{k=1}^K |\mathcal{D}_k| \ell(y_{\boldsymbol{m}}, \mathcal{D}_k) + \frac{\lambda}{n} H(\boldsymbol{m})$ and define a local loss with a regularization term, enabling training with straight-through estimators and Bernoulli mask sampling. Experiments on MNIST, CIFAR-10, and CIFAR-100 under IID and non-IID conditions show substantial gains in communication and memory efficiency—up to about five orders of magnitude—while maintaining competitive validation accuracy, demonstrating practical impact for resource-constrained edge FL.

Abstract

This work presents a new method for enhancing communication efficiency in stochastic Federated Learning that trains over-parameterized random networks. In this setting, a binary mask is optimized instead of the model weights, which are kept fixed. The mask characterizes a sparse sub-network that is able to generalize as good as a smaller target network. Importantly, sparse binary masks are exchanged rather than the floating point weights in traditional federated learning, reducing communication cost to at most 1 bit per parameter (Bpp). We show that previous state of the art stochastic methods fail to find sparse networks that can reduce the communication and storage overhead using consistent loss objectives. To address this, we propose adding a regularization term to local objectives that acts as a proxy of the transmitted masks entropy, therefore encouraging sparser solutions by eliminating redundant features across sub-networks. Extensive empirical experiments demonstrate significant improvements in communication and memory efficiency of up to five magnitudes compared to the literature, with minimal performance degradation in validation accuracy in some instances

Communication-Efficient Federated Learning via Regularized Sparse Random Networks

TL;DR

This paper tackles communication bottlenecks in Federated Learning by training over-parameterized random networks and exchanging sparse binary masks (up to bit per parameter). It identifies that existing stochastic masking approaches do not reliably produce sparse sub-networks under consistent objectives and introduces a regularized loss that penalizes mask entropy to promote sparsity while preserving generalization. The authors formalize the objective and define a local loss with a regularization term, enabling training with straight-through estimators and Bernoulli mask sampling. Experiments on MNIST, CIFAR-10, and CIFAR-100 under IID and non-IID conditions show substantial gains in communication and memory efficiency—up to about five orders of magnitude—while maintaining competitive validation accuracy, demonstrating practical impact for resource-constrained edge FL.

Abstract

This work presents a new method for enhancing communication efficiency in stochastic Federated Learning that trains over-parameterized random networks. In this setting, a binary mask is optimized instead of the model weights, which are kept fixed. The mask characterizes a sparse sub-network that is able to generalize as good as a smaller target network. Importantly, sparse binary masks are exchanged rather than the floating point weights in traditional federated learning, reducing communication cost to at most 1 bit per parameter (Bpp). We show that previous state of the art stochastic methods fail to find sparse networks that can reduce the communication and storage overhead using consistent loss objectives. To address this, we propose adding a regularization term to local objectives that acts as a proxy of the transmitted masks entropy, therefore encouraging sparser solutions by eliminating redundant features across sub-networks. Extensive empirical experiments demonstrate significant improvements in communication and memory efficiency of up to five magnitudes compared to the literature, with minimal performance degradation in validation accuracy in some instances
Paper Structure (7 sections, 14 equations, 2 figures)

This paper contains 7 sections, 14 equations, 2 figures.

Figures (2)

  • Figure 1: From left to right: CIFAR10, MNIST, CIFAR100 experiments. First row: Validation Accuracy vs Rounds. Second row: The corresponding Average Bit-per-parameter (Bpp) required vs Rounds.
  • Figure 2: Trade-off between validation accuracy and average Bpp for different regularization $\lambda$ in non-IID CIFAR10 and MNIST datasets settings.