Masked Random Noise for Communication Efficient Federated Learning
Shiwei Li, Yingyi Cheng, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Dugang Liu, Xiuqiang He, Ruixuan Li
TL;DR
The paper tackles the high uplink communication cost in federated learning by reframing local updates as masked random noise. It introduces FedMRN, which learns a 1-bit per-parameter mask and uses a predefined random noise generator to form masked updates, transmitted via a random seed; a progressive stochastic masking strategy guides mask optimization during local training. The authors prove convergence guarantees for both strongly convex and non-convex objectives, show that their approach reduces communication to 1 bit per parameter, and demonstrate through experiments on four datasets that FedMRN achieves faster convergence and competitive accuracy with FedAvg, while outperforming several baselines that rely on post-training compression. The work highlights the viability of learning to compress model updates directly within the local training loop and suggests broad applicability of masked random noise in communication-efficient FL.
Abstract
Federated learning is a promising distributed training paradigm that effectively safeguards data privacy. However, it may involve significant communication costs, which hinders training efficiency. In this paper, we aim to enhance communication efficiency from a new perspective. Specifically, we request the distributed clients to find optimal model updates relative to global model parameters within predefined random noise. For this purpose, we propose Federated Masked Random Noise (FedMRN), a novel framework that enables clients to learn a 1-bit mask for each model parameter and apply masked random noise (i.e., the Hadamard product of random noise and masks) to represent model updates. To make FedMRN feasible, we propose an advanced mask training strategy, called progressive stochastic masking (PSM). After local training, each client only need to transmit local masks and a random seed to the server. Additionally, we provide theoretical guarantees for the convergence of FedMRN under both strongly convex and non-convex assumptions. Extensive experiments are conducted on four popular datasets. The results show that FedMRN exhibits superior convergence speed and test accuracy compared to relevant baselines, while attaining a similar level of accuracy as FedAvg.
