Delay Sensitive Hierarchical Federated Learning with Stochastic Local Updates

Abdulmoneam Ali; Ahmed Arafa

Delay Sensitive Hierarchical Federated Learning with Stochastic Local Updates

Abdulmoneam Ali, Ahmed Arafa

TL;DR

The paper addresses federated learning in delay-prone networks by proposing a delay-sensitive hierarchical FL (HFL) framework with local parameter servers (LPSs) and a global parameter server (GPS). Local updates are stochastic and occur for a random number of iterations $t_i^u$ within a global sync window $S$, with the overall training time constrained by a system deadline $T$; the GPS aggregates after the maximum group latency. The authors derive a bound on the LPS-GPS divergence (Lemma 1) and establish convergence guarantees for non-convex objectives at both the local-group and global levels, highlighting how the number of groups, group sizes, and $S$ govern performance. They show a sublinear convergence rate $\igO(1/\sqrt{\mathcal{U}})$ under reasonable local-time bounds and validate the theory with experiments across datasets, illustrating how to tune $S$ and clustering to mitigate delay effects and improve fairness. The work demonstrates that carefully designed synchronization and grouping can yield substantial gains in delay-constrained FL and provides practical guidance for deploying HFL in 6G-era networks.

Abstract

The impact of local averaging on the performance of federated learning (FL) systems is studied in the presence of communication delay between the clients and the parameter server. To minimize the effect of delay, clients are assigned into different groups, each having its own local parameter server (LPS) that aggregates its clients' models. The groups' models are then aggregated at a global parameter server (GPS) that only communicates with the LPSs. Such setting is known as hierarchical FL (HFL). Unlike most works in the literature, the number of local and global communication rounds in our work is randomly determined by the (different) delays experienced by each group of clients. Specifically, the number of local averaging rounds is tied to a wall-clock time period coined the sync time $S$, after which the LPSs synchronize their models by sharing them with the GPS. Such sync time $S$ is then reapplied until a global wall-clock time is exhausted. First, an upper bound on the deviation between the updated model at each LPS with respect to that available at the GPS is derived. This is then used as a tool to derive the convergence analysis of our proposed delay-sensitive HFL algorithm, first at each LPS individually, and then at the GPS. Our theoretical convergence bound showcases the effects of the whole system's parameters, including the number of groups, the number of clients per group, and the value of $S$. Our results show that the value of $S$ should be carefully chosen, especially since it implicitly governs how the delay statistics affect the performance of HFL in situations where training time is restricted.

Delay Sensitive Hierarchical Federated Learning with Stochastic Local Updates

TL;DR

within a global sync window

, with the overall training time constrained by a system deadline

; the GPS aggregates after the maximum group latency. The authors derive a bound on the LPS-GPS divergence (Lemma 1) and establish convergence guarantees for non-convex objectives at both the local-group and global levels, highlighting how the number of groups, group sizes, and

govern performance. They show a sublinear convergence rate

under reasonable local-time bounds and validate the theory with experiments across datasets, illustrating how to tune

and clustering to mitigate delay effects and improve fairness. The work demonstrates that carefully designed synchronization and grouping can yield substantial gains in delay-constrained FL and provides practical guidance for deploying HFL in 6G-era networks.

Abstract

, after which the LPSs synchronize their models by sharing them with the GPS. Such sync time

is then reapplied until a global wall-clock time is exhausted. First, an upper bound on the deviation between the updated model at each LPS with respect to that available at the GPS is derived. This is then used as a tool to derive the convergence analysis of our proposed delay-sensitive HFL algorithm, first at each LPS individually, and then at the GPS. Our theoretical convergence bound showcases the effects of the whole system's parameters, including the number of groups, the number of clients per group, and the value of

. Our results show that the value of

should be carefully chosen, especially since it implicitly governs how the delay statistics affect the performance of HFL in situations where training time is restricted.

Paper Structure (14 sections, 4 theorems, 38 equations, 13 figures, 1 table, 1 algorithm)

This paper contains 14 sections, 4 theorems, 38 equations, 13 figures, 1 table, 1 algorithm.

Introduction
System Model
Main Results
Experiments
HFL Incentive and Motivation
Significance of Choosing the Sync Time $S$
Clustering: Effect of the Number of Groups
Effects of Client Association and Global Delay
Conclusions
Preliminaries
Proof of Lemma \ref{['lemma_1']}
Proof of Theorem \ref{['CA_Group']}
Proof of Theorem \ref{['global_convg']}
Proof Of Corollary \ref{['corollary']}

Key Result

Lemma 1

For $0 \leq \alpha \leq \frac{1}{L}$, the delay sensitive HFL algorithm satisfies the following $\forall u, i$:

Figures (13)

Figure 1: System model of delay sensitive HFL.
Figure 2: Example sample path of global rounds and local iterations of 2 groups with wall-clock times considerations.
Figure 3: HFL: 10 clients per group, parameters $\{0.09,0.1,0.009,0.01,1,3,0.05,0.1\}$, and $S=5$.
Figure 4: Significance of group cooperation under non-i.i.d data.
Figure 5: Significance of group cooperation.
...and 8 more figures

Theorems & Definitions (11)

Lemma 1
Remark 1
Remark 2
Remark 3
Theorem 1: Convergence Analysis per Group
Remark 4
Theorem 2: Global Convergence Analysis
Remark 5
Remark 6
Remark 7
...and 1 more

Delay Sensitive Hierarchical Federated Learning with Stochastic Local Updates

TL;DR

Abstract

Delay Sensitive Hierarchical Federated Learning with Stochastic Local Updates

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (11)