Table of Contents
Fetching ...

Semi-Decentralized Federated Edge Learning for Fast Convergence on Non-IID Data

Yuchang Sun, Jiawei Shao, Yuyi Mao, Jessie Hui Wang, Jun Zhang

TL;DR

By allowing model aggregation across different edge clusters, SD-FEEL enjoys the benefit of FEEL in reducing the training latency, while improving the learning performance by accessing richer training data from multiple edge clusters by achieving faster convergence than traditional federated learning architectures.

Abstract

Federated edge learning (FEEL) has emerged as an effective approach to reduce the large communication latency in Cloud-based machine learning solutions, while preserving data privacy. Unfortunately, the learning performance of FEEL may be compromised due to limited training data in a single edge cluster. In this paper, we investigate a novel framework of FEEL, namely semi-decentralized federated edge learning (SD-FEEL). By allowing model aggregation across different edge clusters, SD-FEEL enjoys the benefit of FEEL in reducing the training latency, while improving the learning performance by accessing richer training data from multiple edge clusters. A training algorithm for SD-FEEL with three main procedures in each round is presented, including local model updates, intra-cluster and inter-cluster model aggregations, which is proved to converge on non-independent and identically distributed (non-IID) data. We also characterize the interplay between the network topology of the edge servers and the communication overhead of inter-cluster model aggregation on the training performance. Experiment results corroborate our analysis and demonstrate the effectiveness of SD-FFEL in achieving faster convergence than traditional federated learning architectures. Besides, guidelines on choosing critical hyper-parameters of the training algorithm are also provided.

Semi-Decentralized Federated Edge Learning for Fast Convergence on Non-IID Data

TL;DR

By allowing model aggregation across different edge clusters, SD-FEEL enjoys the benefit of FEEL in reducing the training latency, while improving the learning performance by accessing richer training data from multiple edge clusters by achieving faster convergence than traditional federated learning architectures.

Abstract

Federated edge learning (FEEL) has emerged as an effective approach to reduce the large communication latency in Cloud-based machine learning solutions, while preserving data privacy. Unfortunately, the learning performance of FEEL may be compromised due to limited training data in a single edge cluster. In this paper, we investigate a novel framework of FEEL, namely semi-decentralized federated edge learning (SD-FEEL). By allowing model aggregation across different edge clusters, SD-FEEL enjoys the benefit of FEEL in reducing the training latency, while improving the learning performance by accessing richer training data from multiple edge clusters. A training algorithm for SD-FEEL with three main procedures in each round is presented, including local model updates, intra-cluster and inter-cluster model aggregations, which is proved to converge on non-independent and identically distributed (non-IID) data. We also characterize the interplay between the network topology of the edge servers and the communication overhead of inter-cluster model aggregation on the training performance. Experiment results corroborate our analysis and demonstrate the effectiveness of SD-FFEL in achieving faster convergence than traditional federated learning architectures. Besides, guidelines on choosing critical hyper-parameters of the training algorithm are also provided.

Paper Structure

This paper contains 22 sections, 10 theorems, 48 equations, 6 figures, 1 algorithm.

Key Result

Lemma 1

The local models evolve according to the following expression: where

Figures (6)

  • Figure 1: The semi-decentralized FEEL system.
  • Figure 2: Typical network topologies of the edge servers.
  • Figure 3: (a) Training loss and (b) test accuracy over time ($\tau_1=2$, $\tau_2=1$ and $\alpha=5$).
  • Figure 4: Training loss of SD-FEEL ($\tau_2=1$ and $\alpha=1$) over (a) iterations and (b) time.
  • Figure 5: Test accuracy over iterations ($\tau_1=5$, $\tau_2=5$, and $\alpha=1$ by default) with (a) different network topologies and (b) different values of $\alpha$.
  • ...and 1 more figures

Theorems & Definitions (13)

  • Lemma 1
  • Lemma 2
  • Theorem 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • ...and 3 more