Rendering Wireless Environments Useful for Gradient Estimators: A Zero-Order Stochastic Federated Learning Method
Elissa Mhanna, Mohamad Assaad
TL;DR
This work tackles cross-device federated learning over wireless links where uplink bandwidth is a bottleneck. It introduces 1P-ZOFL, a doubly communication-efficient zero-order method that uses a one-point gradient estimator and restricts each device to scalar communications, while embedding the wireless channel into the learning process. The authors prove almost-sure convergence for the nonconvex setting and establish a rate bound of $O(K^{-1/3+cepsilon})$, supported by experiments on MNIST showing robustness to channel noise and data heterogeneity. The approach offers significant practical gains by eliminating the need for channel estimation and reducing communication to two scalars per device per round, making large-scale wireless FL more feasible and scalable.
Abstract
Cross-device federated learning (FL) is a growing machine learning setting whereby multiple edge devices collaborate to train a model without disclosing their raw data. With the great number of mobile devices participating in more FL applications via the wireless environment, the practical implementation of these applications will be hindered due to the limited uplink capacity of devices, causing critical bottlenecks. In this work, we propose a novel doubly communication-efficient zero-order (ZO) method with a one-point gradient estimator that replaces communicating long vectors with scalar values and that harnesses the nature of the wireless communication channel, overcoming the need to know the channel state coefficient. It is the first method that includes the wireless channel in the learning algorithm itself instead of wasting resources to analyze it and remove its impact. We then offer a thorough analysis of the proposed zero-order federated learning (ZOFL) framework and prove that our method converges \textit{almost surely}, which is a novel result in nonconvex ZO optimization. We further prove a convergence rate of $O(\frac{1}{\sqrt[3]{K}})$ in the nonconvex setting. We finally demonstrate the potential of our algorithm with experimental results.
