Masked Random Noise for Communication Efficient Federated Learning

Shiwei Li; Yingyi Cheng; Haozhao Wang; Xing Tang; Shijie Xu; Weihong Luo; Yuhua Li; Dugang Liu; Xiuqiang He; Ruixuan Li

Masked Random Noise for Communication Efficient Federated Learning

Shiwei Li, Yingyi Cheng, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Dugang Liu, Xiuqiang He, Ruixuan Li

TL;DR

The paper tackles the high uplink communication cost in federated learning by reframing local updates as masked random noise. It introduces FedMRN, which learns a 1-bit per-parameter mask and uses a predefined random noise generator to form masked updates, transmitted via a random seed; a progressive stochastic masking strategy guides mask optimization during local training. The authors prove convergence guarantees for both strongly convex and non-convex objectives, show that their approach reduces communication to 1 bit per parameter, and demonstrate through experiments on four datasets that FedMRN achieves faster convergence and competitive accuracy with FedAvg, while outperforming several baselines that rely on post-training compression. The work highlights the viability of learning to compress model updates directly within the local training loop and suggests broad applicability of masked random noise in communication-efficient FL.

Abstract

Federated learning is a promising distributed training paradigm that effectively safeguards data privacy. However, it may involve significant communication costs, which hinders training efficiency. In this paper, we aim to enhance communication efficiency from a new perspective. Specifically, we request the distributed clients to find optimal model updates relative to global model parameters within predefined random noise. For this purpose, we propose Federated Masked Random Noise (FedMRN), a novel framework that enables clients to learn a 1-bit mask for each model parameter and apply masked random noise (i.e., the Hadamard product of random noise and masks) to represent model updates. To make FedMRN feasible, we propose an advanced mask training strategy, called progressive stochastic masking (PSM). After local training, each client only need to transmit local masks and a random seed to the server. Additionally, we provide theoretical guarantees for the convergence of FedMRN under both strongly convex and non-convex assumptions. Extensive experiments are conducted on four popular datasets. The results show that FedMRN exhibits superior convergence speed and test accuracy compared to relevant baselines, while attaining a similar level of accuracy as FedAvg.

Masked Random Noise for Communication Efficient Federated Learning

TL;DR

Abstract

Paper Structure (37 sections, 8 theorems, 66 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 37 sections, 8 theorems, 66 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Lottery Tickets and Supermasks
Supermasks for Federated Learning
Communication Compression
Methodology
Problem Formulation
Progressive Stochastic Masking
Stochastic Masking
Progressive Masking
Federated Masked Random Noise
Convergence Analysis
Experiments
Experimental Setup
Datasets and Models.
...and 22 more sections

Key Result

Theorem 1

(Strongly convex.) Let Assumptions asmp:lsmooth-asmp:convex hold. Choose $\kappa = L/\mu, \gamma = \max\{8\kappa, S\}-1$ and the learning rate $\eta_t = 2/\mu(\gamma+t)$. Generating the noise from the Bernoulli distribution $\{-2\eta_0SG, 2\eta_0 SG\}$, then FedMRN satisfies where $B =\frac{\sigma^2}{N} + 6L\Gamma + 8(1+q^2)(S-1)^2G^2 + 4\frac{q^2(N-1)+N-K}{K(N-1)}S^2G^2$.

Figures (6)

Figure 1: An illustration of FedMRN. $\mathcal{G}$ is a random noise generator. In the example, each binary mask $m \in \{0,1\}$, therefore the masked random noise is sparse. It is worth noting that the mask can also take values from $\{-1, 1\}$, i.e., the signed mask. In such case, the presence of a dotted line indicates changing the sign of the corresponding noise, rather than pruning it off.
Figure 2: Schematic diagram of SM and PM. In subfigure (b), $\tau$ is the number of current local iterations, and $S$ is the total number of local iteration steps. $p$ will increase to 1 as training progresses, so that each element of the model updates will eventually be mapped into masked noise.
Figure 3: Convergence curves under the Non-IID-2 data distribution.
Figure 4: Results of ablation studies.
Figure 5: The accuracy of FedMRN with different random noise. The horizontal axis represents the noise magnitude.
...and 1 more figures

Theorems & Definitions (10)

Theorem 1
Theorem 2
Proposition 1
Remark 1
Remark 2
Lemma 1
Lemma 2
Lemma 3
Lemma 4
Lemma 5

Masked Random Noise for Communication Efficient Federated Learning

TL;DR

Abstract

Masked Random Noise for Communication Efficient Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (10)