Table of Contents
Fetching ...

Sequential Federated Learning in Hierarchical Architecture on Non-IID Datasets

Xingrun Yan, Shiyuan Zuo, Rongfei Fan, Han Hu, Li Shen, Puning Zhao, Yong Luo

TL;DR

The paper tackles high communication costs in federated learning by proposing Fed-CHS, a sequential hierarchical FL framework that eliminates the central parameter server and trains the global model by passing it between adjacent edge servers. It provides convergence guarantees for both strongly convex and non-convex losses under non-IID data distributions, with tunable rates via local-iteration parameter K and step-sizes, and demonstrates robustness to topology and data heterogeneity. Empirical results on MNIST, CIFAR-10, and CIFAR-100 show Fed-CHS often surpasses baselines in test accuracy while substantially reducing communication overhead, particularly when data are highly non-IID. The approach is particularly relevant for distributed networked systems such as IoV and LEO-satellite-terrestrial networks, where centralized PS communication is costly or impractical.

Abstract

In a real federated learning (FL) system, communication overhead for passing model parameters between the clients and the parameter server (PS) is often a bottleneck. Hierarchical federated learning (HFL) that poses multiple edge servers (ESs) between clients and the PS can partially alleviate communication pressure but still needs the aggregation of model parameters from multiple ESs at the PS. To further reduce communication overhead, we bring sequential FL (SFL) into HFL for the first time, which removes the central PS and enables the model training to be completed only through passing the global model between two adjacent ESs for each iteration, and propose a novel algorithm adaptive to such a combinational framework, referred to as Fed-CHS. Convergence results are derived for strongly convex and non-convex loss functions under various data heterogeneity setups, which show comparable convergence performance with the algorithms for HFL or SFL solely. Experimental results provide evidence of the superiority of our proposed Fed-CHS on both communication overhead saving and test accuracy over baseline methods.

Sequential Federated Learning in Hierarchical Architecture on Non-IID Datasets

TL;DR

The paper tackles high communication costs in federated learning by proposing Fed-CHS, a sequential hierarchical FL framework that eliminates the central parameter server and trains the global model by passing it between adjacent edge servers. It provides convergence guarantees for both strongly convex and non-convex losses under non-IID data distributions, with tunable rates via local-iteration parameter K and step-sizes, and demonstrates robustness to topology and data heterogeneity. Empirical results on MNIST, CIFAR-10, and CIFAR-100 show Fed-CHS often surpasses baselines in test accuracy while substantially reducing communication overhead, particularly when data are highly non-IID. The approach is particularly relevant for distributed networked systems such as IoV and LEO-satellite-terrestrial networks, where centralized PS communication is costly or impractical.

Abstract

In a real federated learning (FL) system, communication overhead for passing model parameters between the clients and the parameter server (PS) is often a bottleneck. Hierarchical federated learning (HFL) that poses multiple edge servers (ESs) between clients and the PS can partially alleviate communication pressure but still needs the aggregation of model parameters from multiple ESs at the PS. To further reduce communication overhead, we bring sequential FL (SFL) into HFL for the first time, which removes the central PS and enables the model training to be completed only through passing the global model between two adjacent ESs for each iteration, and propose a novel algorithm adaptive to such a combinational framework, referred to as Fed-CHS. Convergence results are derived for strongly convex and non-convex loss functions under various data heterogeneity setups, which show comparable convergence performance with the algorithms for HFL or SFL solely. Experimental results provide evidence of the superiority of our proposed Fed-CHS on both communication overhead saving and test accuracy over baseline methods.
Paper Structure (33 sections, 7 theorems, 61 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 33 sections, 7 theorems, 61 equations, 9 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

With the holding of ass:smooth, ass:convex, ass:gradient, and ass:heterogeneity, for the set of $\eta_k$ such that $\eta_k \leq \frac{1}{2LK}, \forall k\in \mathcal{K}$, define where $w^*$ is the global minimizer of loss function $F(w)$ and $f_n^*$ is the minimum of $f_n(w)$, then there is

Figures (9)

  • Figure 1: The framework of SFL in hierarchical architecture. In this architecture, clients are divided into multiple clusters, each of which is managed by one ES. For each step of iteration, model parameter is firstly updated within one cluster through multiple interactions between the ES and the associated clients, and then migrated to a neighbor ES (cluster) for next step of iteration.
  • Figure 2: Results of communication overhead for different algorithms and datasets.
  • Figure 3: Results for different hyper-parameters with MLP or LENET
  • Figure 4: Result for different data heterogeneity in ES
  • Figure 5: Convergence performance of Fed-CHS and baselines in different models and Dirichlet parameters, using MNIST dataset
  • ...and 4 more figures

Theorems & Definitions (14)

  • Theorem 4.1
  • proof
  • Theorem 4.3
  • proof
  • Lemma G.1
  • proof
  • Lemma G.2
  • proof
  • Lemma G.3
  • proof
  • ...and 4 more