Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off
Muhammad Faraz Ul Abrar, Nicolò Michelusi
TL;DR
This paper tackles non-convex OTA-FL under wireless heterogeneity by relaxing the zero-bias constraint and introducing a structured, time-invariant bias in gradient aggregation. It derives a finite-time stationarity bound that exposes a bias-variance trade-off governed by OTA pre-scalers and participation weights, and then optimizes these parameters via a non-convex joint power-control problem solved with a successive convex approximation algorithm using only statistical CSI. The proposed SCA method achieves faster convergence and better generalization than zero-bias baselines while reducing the need for instantaneous CSI at the base station. Experiments on a non-convex MNIST task validate the theory, showing that controlled bias can substantially improve OTA-FL performance in heterogeneous wireless environments.
Abstract
Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel to aggregate model updates in a single use. Existing OTA-FL designs largely enforce zero-bias model updates by either assuming \emph{homogeneous} wireless conditions (equal path loss across devices) or forcing zero-bias updates to guarantee convergence. Under \emph{heterogeneous} wireless scenarios, however, such designs are constrained by the weakest device and inflate the update variance. Moreover, prior analyses of biased OTA-FL largely address convex objectives, while most modern AI models are highly non-convex. Motivated by these gaps, we study OTA-FL with stochastic gradient descent (SGD) for general smooth non-convex objectives under wireless heterogeneity. We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates. We derive a finite-time stationarity bound (expected time average squared gradient norm) that explicitly reveals a bias-variance trade-off. To optimize this trade-off, we pose a non-convex joint OTA power-control design and develop an efficient successive convex approximation (SCA) algorithm that requires only statistical CSI at the base station. Experiments on a non-convex image classification task validate the approach: the SCA-based design accelerates convergence via an optimized bias and improves generalization over prior OTA-FL baselines.
