Rendering Wireless Environments Useful for Gradient Estimators: A Zero-Order Stochastic Federated Learning Method

Elissa Mhanna; Mohamad Assaad

Rendering Wireless Environments Useful for Gradient Estimators: A Zero-Order Stochastic Federated Learning Method

Elissa Mhanna, Mohamad Assaad

TL;DR

This work tackles cross-device federated learning over wireless links where uplink bandwidth is a bottleneck. It introduces 1P-ZOFL, a doubly communication-efficient zero-order method that uses a one-point gradient estimator and restricts each device to scalar communications, while embedding the wireless channel into the learning process. The authors prove almost-sure convergence for the nonconvex setting and establish a rate bound of $O(K^{-1/3+cepsilon})$, supported by experiments on MNIST showing robustness to channel noise and data heterogeneity. The approach offers significant practical gains by eliminating the need for channel estimation and reducing communication to two scalars per device per round, making large-scale wireless FL more feasible and scalable.

Abstract

Cross-device federated learning (FL) is a growing machine learning setting whereby multiple edge devices collaborate to train a model without disclosing their raw data. With the great number of mobile devices participating in more FL applications via the wireless environment, the practical implementation of these applications will be hindered due to the limited uplink capacity of devices, causing critical bottlenecks. In this work, we propose a novel doubly communication-efficient zero-order (ZO) method with a one-point gradient estimator that replaces communicating long vectors with scalar values and that harnesses the nature of the wireless communication channel, overcoming the need to know the channel state coefficient. It is the first method that includes the wireless channel in the learning algorithm itself instead of wasting resources to analyze it and remove its impact. We then offer a thorough analysis of the proposed zero-order federated learning (ZOFL) framework and prove that our method converges \textit{almost surely}, which is a novel result in nonconvex ZO optimization. We further prove a convergence rate of $O(\frac{1}{\sqrt[3]{K}})$ in the nonconvex setting. We finally demonstrate the potential of our algorithm with experimental results.

Rendering Wireless Environments Useful for Gradient Estimators: A Zero-Order Stochastic Federated Learning Method

TL;DR

, supported by experiments on MNIST showing robustness to channel noise and data heterogeneity. The approach offers significant practical gains by eliminating the need for channel estimation and reducing communication to two scalars per device per round, making large-scale wireless FL more feasible and scalable.

Abstract

in the nonconvex setting. We finally demonstrate the potential of our algorithm with experimental results.

Paper Structure (20 sections, 7 theorems, 31 equations, 3 figures, 1 algorithm)

This paper contains 20 sections, 7 theorems, 31 equations, 3 figures, 1 algorithm.

Introduction
Motivation for Our Work
Contribution
Algorithm
The 1P-ZOFL Algorithm
The Estimated Gradient
Convergence analysis
1P-ZOFL convergence
Experimental results
Conclusion
On the channel model
On the Estimated Gradient
Proof of Lemma \ref{['biased_estimators']}: Biased Estimator
Proof of Lemma \ref{['norm']}: Expected Norm Squared of the Estimated Gradient
Proof of Lemma \ref{['bias-norm-lemma']}: Norm of the Bias
...and 5 more sections

Key Result

Lemma 1

Let Assumptions noise and perturbation be satisfied and define the scalar value $c_1=\beta_1\frac{K_{hh} }{\sigma_h^4}$, then the gradient estimator is biased w.r.t. the objective function's exact gradient $\nabla F(\theta)$. Concretely, $\mathbb{E}[g_k|\mathcal{H}_k] = c_1\gamma_k(\nabla F(\theta_k

Figures (3)

Figure 1: Federated learning over wireless networks.
Figure 2: Accuracy evolution of 1P-ZOFL vs. FedAvg for IID data and non-IID distribution.
Figure 3: Accuracy evolution of 1P-ZOFL for $\sigma_n^2 = \{0.25, 1, 2.25, 10.0489\}$.

Theorems & Definitions (9)

Example 1
Lemma 1
Lemma 2
Lemma 3
Lemma 4
Example 2
Lemma 5
Theorem 1
Theorem 2

Rendering Wireless Environments Useful for Gradient Estimators: A Zero-Order Stochastic Federated Learning Method

TL;DR

Abstract

Rendering Wireless Environments Useful for Gradient Estimators: A Zero-Order Stochastic Federated Learning Method

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (9)