Sequential Federated Learning in Hierarchical Architecture on Non-IID Datasets
Xingrun Yan, Shiyuan Zuo, Rongfei Fan, Han Hu, Li Shen, Puning Zhao, Yong Luo
TL;DR
The paper tackles high communication costs in federated learning by proposing Fed-CHS, a sequential hierarchical FL framework that eliminates the central parameter server and trains the global model by passing it between adjacent edge servers. It provides convergence guarantees for both strongly convex and non-convex losses under non-IID data distributions, with tunable rates via local-iteration parameter K and step-sizes, and demonstrates robustness to topology and data heterogeneity. Empirical results on MNIST, CIFAR-10, and CIFAR-100 show Fed-CHS often surpasses baselines in test accuracy while substantially reducing communication overhead, particularly when data are highly non-IID. The approach is particularly relevant for distributed networked systems such as IoV and LEO-satellite-terrestrial networks, where centralized PS communication is costly or impractical.
Abstract
In a real federated learning (FL) system, communication overhead for passing model parameters between the clients and the parameter server (PS) is often a bottleneck. Hierarchical federated learning (HFL) that poses multiple edge servers (ESs) between clients and the PS can partially alleviate communication pressure but still needs the aggregation of model parameters from multiple ESs at the PS. To further reduce communication overhead, we bring sequential FL (SFL) into HFL for the first time, which removes the central PS and enables the model training to be completed only through passing the global model between two adjacent ESs for each iteration, and propose a novel algorithm adaptive to such a combinational framework, referred to as Fed-CHS. Convergence results are derived for strongly convex and non-convex loss functions under various data heterogeneity setups, which show comparable convergence performance with the algorithms for HFL or SFL solely. Experimental results provide evidence of the superiority of our proposed Fed-CHS on both communication overhead saving and test accuracy over baseline methods.
