Scalable Hierarchical Over-the-Air Federated Learning
Seyed Mohammad Azimi-Abarghouyi, Viktoria Fodor
TL;DR
This work tackles scalability, interference, and data heterogeneity in hierarchical federated learning over wireless networks by introducing MultiAirFed, a two-level learning method that combines intra-cluster gradient aggregation with inter-cluster model-parameter aggregation. A scalable clustered over-the-air uplink and a bandwidth-limited analog downlink transmission scheme are proposed to operate on a single wireless resource block, with optimized receiver normalizing factors to mitigate distortion. The authors model device locations with a Poisson cluster process, derive a tractable convergence bound for multi-cluster networks, and quantify uplink/downlink interference effects. Experimental results on MNIST and CIFAR-10 demonstrate that MultiAirFed outperforms conventional HierFed under realistic interference and non-i.i.d. data, validating the approach’s potential for large-scale edge learning deployments.
Abstract
When implementing hierarchical federated learning over wireless networks, scalability assurance and the ability to handle both interference and device data heterogeneity are crucial. This work introduces a new two-level learning method designed to address these challenges, along with a scalable over-the-air aggregation scheme for the uplink and a bandwidth-limited broadcast scheme for the downlink that efficiently use a single wireless resource. To provide resistance against data heterogeneity, we employ gradient aggregations. Meanwhile, the impact of uplink and downlink interference is minimized through optimized receiver normalizing factors. We present a comprehensive mathematical approach to derive the convergence bound for the proposed algorithm, applicable to a multi-cluster wireless network encompassing any count of collaborating clusters, and provide special cases and design remarks. As a key step to enable a tractable analysis, we develop a spatial model for the setup by modeling devices as a Poisson cluster process over the edge servers and rigorously quantify uplink and downlink error terms due to the interference. Finally, we show that despite the interference and data heterogeneity, the proposed algorithm not only achieves high learning accuracy for a variety of parameters but also significantly outperforms the conventional hierarchical learning algorithm.
