Federated Learning over Hierarchical Wireless Networks: Training Latency Minimization via Submodel Partitioning
Wenzhi Fang, Dong-Jun Han, Christopher G. Brinton
TL;DR
This paper tackles the scalability and latency challenges of hierarchical federated learning on resource-constrained wireless networks by introducing HIST, which partitions the global model into per-round submodels trained by distinct cell groups. It provides convergence guarantees for non-convex loss under non-i.i.d. data, derives a latency-aware submodel partitioning strategy, and extends the framework with AirComp to further reduce edge aggregation latency. The authors validate HIST on fully connected and convolutional networks, showing substantial reductions in training time and communication cost while maintaining accuracy, with AirComp-HIST offering additional latency gains under realistic wireless conditions. The work advances practical FL in multi-layer networks and opens paths to applying submodel partitioning to transformer-based fine-tuning with LoRA in edge settings.
Abstract
Hierarchical federated learning (HFL) has demonstrated promising scalability advantages over the traditional "star-topology" architecture-based federated learning (FL). However, HFL still imposes significant computation, communication, and storage burdens on the edge, especially when training a large-scale model over resource-constrained wireless devices. In this paper, we propose hierarchical independent submodel training (HIST), a new FL methodology that aims to address these issues in hierarchical cloud-edge-client networks. The key idea behind HIST is to divide the global model into disjoint partitions (or submodels) per round so that each group of clients (i.e., cells) is responsible for training only one partition of the model. We characterize the convergence behavior of HIST under mild assumptions, showing the impacts of several key attributes (e.g., submodel sizes, number of cells, edge and global aggregation frequencies) on the rate and stationarity gap. Building upon the theoretical results, we propose a submodel partitioning strategy to minimize the training latency depending on network resource availability and a target learning performance guarantee. We then demonstrate how HIST can be augmented with over-the-air computation (AirComp) to further enhance the efficiency of the model aggregation over the edge cells. Through numerical evaluations, we verify that HIST is able to save training time and communication costs by wide margins while achieving comparable accuracy as conventional HFL. Moreover, our experiments demonstrate that AirComp-assisted HIST provides further improvements in training latency.
