Table of Contents
Fetching ...

TRAIL: Trust-Aware Client Scheduling for Semi-Decentralized Federated Learning

Gangqiang Hu, Jianfeng Lu, Jianmin Han, Shuqin Cao, Jing Liu, Hao Fu

TL;DR

The paper tackles dynamic client participation and unreliable communications in semi-decentralized Federated Learning (SD-FL). It introduces TRAIL, which combines an Adaptive Hidden Semi-Markov Model (AHSMM) for predicting client quality with a greedy client scheduling algorithm to minimize global loss, under a joint client–server association framework. Key contributions include the design of AHSMM with multi-parameter fusion via MLLR, a convergence bound for the proposed scheduling, and extensive experiments on MNIST, EMNIST, CIFAR-10, and SVHN showing improvements of $8.7\%$ in test accuracy and $15.3\%$ in training loss over baselines. The results demonstrate TRAIL's robustness and efficiency in real-world SD-FL scenarios with heterogeneous and intermittently participating clients, enabling more reliable distributed learning with limited communication overhead.

Abstract

Due to the sensitivity of data, Federated Learning (FL) is employed to enable distributed machine learning while safeguarding data privacy and accommodating the requirements of various devices. However, in the context of semi-decentralized FL, clients' communication and training states are dynamic. This variability arises from local training fluctuations, heterogeneous data distributions, and intermittent client participation. Most existing studies primarily focus on stable client states, neglecting the dynamic challenges inherent in real-world scenarios. To tackle this issue, we propose a TRust-Aware clIent scheduLing mechanism called TRAIL, which assesses client states and contributions, enhancing model training efficiency through selective client participation. We focus on a semi-decentralized FL framework where edge servers and clients train a shared global model using unreliable intra-cluster model aggregation and inter-cluster model consensus. First, we propose an adaptive hidden semi-Markov model to estimate clients' communication states and contributions. Next, we address a client-server association optimization problem to minimize global training loss. Using convergence analysis, we propose a greedy client scheduling algorithm. Finally, our experiments conducted on real-world datasets demonstrate that TRAIL outperforms state-of-the-art baselines, achieving an improvement of 8.7% in test accuracy and a reduction of 15.3% in training loss.

TRAIL: Trust-Aware Client Scheduling for Semi-Decentralized Federated Learning

TL;DR

The paper tackles dynamic client participation and unreliable communications in semi-decentralized Federated Learning (SD-FL). It introduces TRAIL, which combines an Adaptive Hidden Semi-Markov Model (AHSMM) for predicting client quality with a greedy client scheduling algorithm to minimize global loss, under a joint client–server association framework. Key contributions include the design of AHSMM with multi-parameter fusion via MLLR, a convergence bound for the proposed scheduling, and extensive experiments on MNIST, EMNIST, CIFAR-10, and SVHN showing improvements of in test accuracy and in training loss over baselines. The results demonstrate TRAIL's robustness and efficiency in real-world SD-FL scenarios with heterogeneous and intermittently participating clients, enabling more reliable distributed learning with limited communication overhead.

Abstract

Due to the sensitivity of data, Federated Learning (FL) is employed to enable distributed machine learning while safeguarding data privacy and accommodating the requirements of various devices. However, in the context of semi-decentralized FL, clients' communication and training states are dynamic. This variability arises from local training fluctuations, heterogeneous data distributions, and intermittent client participation. Most existing studies primarily focus on stable client states, neglecting the dynamic challenges inherent in real-world scenarios. To tackle this issue, we propose a TRust-Aware clIent scheduLing mechanism called TRAIL, which assesses client states and contributions, enhancing model training efficiency through selective client participation. We focus on a semi-decentralized FL framework where edge servers and clients train a shared global model using unreliable intra-cluster model aggregation and inter-cluster model consensus. First, we propose an adaptive hidden semi-Markov model to estimate clients' communication states and contributions. Next, we address a client-server association optimization problem to minimize global training loss. Using convergence analysis, we propose a greedy client scheduling algorithm. Finally, our experiments conducted on real-world datasets demonstrate that TRAIL outperforms state-of-the-art baselines, achieving an improvement of 8.7% in test accuracy and a reduction of 15.3% in training loss.

Paper Structure

This paper contains 18 sections, 1 theorem, 53 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

By setting the learning rate $\lambda = \frac{1}{L}$, the upper bound of the expected difference $\mathbb{E}\left(F\left(\boldsymbol{g}_{t+1}\right) - F\left(\boldsymbol{g}^*\right)\right)$ can be established as follows: where $D=1-\frac{\mu}{L}+\frac{4 \omega_2 \mu B}{L}$ , $B=\sum_{m \in \mathcal{S}} \frac{1}{N_m^{(S)}}\Psi$ , and $\Psi=\left(\sum_{i \in \mathcal{U}} n_i d_{i, m}\left(D_m-1+\ma

Figures (3)

  • Figure 1: The SD-FL system framework.
  • Figure 2: The test accuracy and training loss in scenarios with 10% low-quality clients: (a) MNIST, (b) EMNIST, (c) CIFAR10, and (d) SVHN.
  • Figure 3: The test accuracy in scenarios with 10%,30%,50% low-quality clients: (a) MNIST, (b) EMNIST, (c) CIFAR10, and (d) SVHN.

Theorems & Definitions (1)

  • Theorem 1