Table of Contents
Fetching ...

Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off

Muhammad Faraz Ul Abrar, Nicolò Michelusi

TL;DR

This paper tackles non-convex OTA-FL under wireless heterogeneity by relaxing the zero-bias constraint and introducing a structured, time-invariant bias in gradient aggregation. It derives a finite-time stationarity bound that exposes a bias-variance trade-off governed by OTA pre-scalers and participation weights, and then optimizes these parameters via a non-convex joint power-control problem solved with a successive convex approximation algorithm using only statistical CSI. The proposed SCA method achieves faster convergence and better generalization than zero-bias baselines while reducing the need for instantaneous CSI at the base station. Experiments on a non-convex MNIST task validate the theory, showing that controlled bias can substantially improve OTA-FL performance in heterogeneous wireless environments.

Abstract

Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model updates in a single use. Existing OTA-FL designs largely enforce zero-bias model updates by either assuming \emph{homogeneous} wireless conditions (equal path loss across devices) or forcing zero-bias updates to guarantee convergence. Under \emph{heterogeneous} wireless scenarios, however, such designs are constrained by the weakest device and inflate the update variance. Moreover, prior analyses of biased OTA-FL largely address convex objectives, while most modern AI models are highly non-convex. Motivated by these gaps, we study OTA-FL with stochastic gradient descent (SGD) for general smooth non-convex objectives under wireless heterogeneity. We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates. We derive a finite-time stationarity bound (expected time average squared gradient norm) that explicitly reveals a bias-variance trade-off. To optimize this trade-off, we pose a non-convex joint OTA power-control design and develop an efficient successive convex approximation (SCA) algorithm that requires only statistical CSI at the base station. Experiments on a non-convex image classification task validate the approach: the SCA-based design accelerates convergence via an optimized bias and improves generalization over prior OTA-FL baselines.

Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off

TL;DR

This paper tackles non-convex OTA-FL under wireless heterogeneity by relaxing the zero-bias constraint and introducing a structured, time-invariant bias in gradient aggregation. It derives a finite-time stationarity bound that exposes a bias-variance trade-off governed by OTA pre-scalers and participation weights, and then optimizes these parameters via a non-convex joint power-control problem solved with a successive convex approximation algorithm using only statistical CSI. The proposed SCA method achieves faster convergence and better generalization than zero-bias baselines while reducing the need for instantaneous CSI at the base station. Experiments on a non-convex MNIST task validate the theory, showing that controlled bias can substantially improve OTA-FL performance in heterogeneous wireless environments.

Abstract

Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model updates in a single use. Existing OTA-FL designs largely enforce zero-bias model updates by either assuming \emph{homogeneous} wireless conditions (equal path loss across devices) or forcing zero-bias updates to guarantee convergence. Under \emph{heterogeneous} wireless scenarios, however, such designs are constrained by the weakest device and inflate the update variance. Moreover, prior analyses of biased OTA-FL largely address convex objectives, while most modern AI models are highly non-convex. Motivated by these gaps, we study OTA-FL with stochastic gradient descent (SGD) for general smooth non-convex objectives under wireless heterogeneity. We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates. We derive a finite-time stationarity bound (expected time average squared gradient norm) that explicitly reveals a bias-variance trade-off. To optimize this trade-off, we pose a non-convex joint OTA power-control design and develop an efficient successive convex approximation (SCA) algorithm that requires only statistical CSI at the base station. Experiments on a non-convex image classification task validate the approach: the SCA-based design accelerates convergence via an optimized bias and improves generalization over prior OTA-FL baselines.

Paper Structure

This paper contains 9 sections, 1 theorem, 20 equations, 2 figures.

Key Result

Theorem 1

Under Assumptions 1-4 and a fixed learning step size $0 < \eta \le 1/L$, after $T$ FL rounds it holds that where $\zeta$ is the gradient estimation variance, bounded as:

Figures (2)

  • Figure 1: Non-convex OTA-FL in a wireless heterogeneous setup.
  • Figure 2: Comparison of various OTA-FL schemes, $N=10$ devices. The shared legend in (a) indicates each method’s CSI requirement at the PS for OTA power control design.

Theorems & Definitions (1)

  • Theorem 1