Table of Contents
Fetching ...

Adaptive Deadline and Batch Layered Synchronized Federated Learning

Asaf Goren, Natalie Lang, Nir Shlezinger, Alejandro Cohen

Abstract

Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner. However, synchronous FL suffers from latency bottlenecks due to device heterogeneity, where slower clients (stragglers) delay or degrade global updates. Prior solutions, such as fixed deadlines, client selection, and layer-wise partial aggregation, alleviate the effect of stragglers, but treat round timing and local workload as static parameters, limiting their effectiveness under strict time constraints. We propose ADEL-FL, a novel framework that jointly optimizes per-round deadlines and user-specific batch sizes for layer-wise aggregation. Our approach formulates a constrained optimization problem minimizing the expected L2 distance to the global optimum under total training time and global rounds. We provide a convergence analysis under exponential compute models and prove that ADEL-FL yields unbiased updates with bounded variance. Extensive experiments demonstrate that ADEL-FL outperforms alternative methods in both convergence rate and final accuracy under heterogeneous conditions.

Adaptive Deadline and Batch Layered Synchronized Federated Learning

Abstract

Federated learning (FL) enables collaborative model training across distributed edge devices while preserving data privacy, and typically operates in a round-based synchronous manner. However, synchronous FL suffers from latency bottlenecks due to device heterogeneity, where slower clients (stragglers) delay or degrade global updates. Prior solutions, such as fixed deadlines, client selection, and layer-wise partial aggregation, alleviate the effect of stragglers, but treat round timing and local workload as static parameters, limiting their effectiveness under strict time constraints. We propose ADEL-FL, a novel framework that jointly optimizes per-round deadlines and user-specific batch sizes for layer-wise aggregation. Our approach formulates a constrained optimization problem minimizing the expected L2 distance to the global optimum under total training time and global rounds. We provide a convergence analysis under exponential compute models and prove that ADEL-FL yields unbiased updates with bounded variance. Extensive experiments demonstrate that ADEL-FL outperforms alternative methods in both convergence rate and final accuracy under heterogeneous conditions.

Paper Structure

This paper contains 24 sections, 4 theorems, 54 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

When Assumptions itm:convex_smoothness-itm:gradient_bound and Model Formulations itm:computational_model-itm:batch_size hold, the value of $p_{t}^{l} \triangleq P(|\mathcal{U}_{t}^{l}| = 0)$ is bounded by: where $Q(s, x) \triangleq \frac{1}{\Gamma(s)} \int_x^{\infty} t^{s-1} e^{-t} dt$ is the regularized upper incomplete gamma function dlmf.

Figures (4)

  • Figure 1: ADEL-FL illustration. Initially, the server executes an optimization step to determine the adaptive per-round deadlines $\{T_{t}^{\rm{d}}\}_{t=1}^{R}$, represented by the varying amount of sand in the hourglass. Then, for every round, each user performs depth-limited backpropagation, with the extent of computation governed by the deadline. The server aggregates layer-wise gradients across users using only the layers received before the deadline. The color of each layer reflects the set of users that contributed to it, visualized by blending the corresponding user colors. For example, if User $1$ is green and User $U$ is blue, a cyan-colored layer indicates that both contributed to that layer in the current round.
  • Figure 2: Deadline allocation and convergence curves for MNIST using an inverse decaying learning rate. $(a)$ Adaptive deadline per round for ADEL-FL (MLP).$(b)$ Convergence, MLP. $(c)$ Adaptive deadline per round for ADEL-FL (CNN).
  • Figure 3: Deadline allocation and convergence curves for CIFAR-10 using an inverse decaying learning rate. $(a)$ Adaptive deadline per round for ADEL-FL (VGG11).$(b)$ Convergence, VGG11. $(c)$ Adaptive deadline per round for ADEL-FL (VGG11).$(d)$ Convergence, VGG11.
  • Figure 4: Robustness results, CIFAR-10 VGG11. $(a)$ Convergence with $\ell_2$ regularization. $(b)$ Convergence with constant learning rate. $(c)$ Convergence with $E=3$ local iterations. $(d)$ Convergence with $E=5$ local iterations.

Theorems & Definitions (4)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Theorem 1