Table of Contents
Fetching ...

Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates

Natalie Lang, Alejandro Cohen, Nir Shlezinger

TL;DR

This work proposes stragglers-aware layerwise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations.

Abstract

Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches discard incomplete intra-model updates done by stragglers, alter the amount of local workload and architecture, or resort to asynchronous settings; which all affect the trained model performance under tight training latency constraints. In this work, we propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion. SALF allows stragglers to synchronously convey partial gradients, having each layer of the global model be updated independently with a different contributing set of users. We provide a theoretical analysis, establishing convergence guarantees for the global model under mild assumptions on the distribution of the participating devices, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations. This insight is matched with empirical observations, demonstrating the performance gains of SALF compared to alternative mechanisms mitigating the device heterogeneity gap in FL.

Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates

TL;DR

This work proposes stragglers-aware layerwise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations.

Abstract

Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches discard incomplete intra-model updates done by stragglers, alter the amount of local workload and architecture, or resort to asynchronous settings; which all affect the trained model performance under tight training latency constraints. In this work, we propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion. SALF allows stragglers to synchronously convey partial gradients, having each layer of the global model be updated independently with a different contributing set of users. We provide a theoretical analysis, establishing convergence guarantees for the global model under mild assumptions on the distribution of the participating devices, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations. This insight is matched with empirical observations, demonstrating the performance gains of SALF compared to alternative mechanisms mitigating the device heterogeneity gap in FL.
Paper Structure (24 sections, 4 theorems, 29 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 24 sections, 4 theorems, 29 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

When itm:random_stragglers holds, then for every fl round $t$ and dnn layer $l$, and the cardinality of the set $\mathcal{U}_t^l$ is distributed as where $\mathop{\mathrm{\rm Bin}}\limits$ denotes the Binomial distribution.

Figures (7)

  • Figure 1: A device-heterogeneous fl-aided system learning object recognition that is expected to operate under tight latency and edge power constraints. Note that $\boldsymbol{w}_t$ and $\boldsymbol{w}_{u,t}$ denote the global and local models, respectively; where $T^u_t$ it the local computational time of user $u$ in fl round $t$.
  • Figure 2: Illustrative overview of salf for training a deep cnn. The left dashed-box represents local training, where colored layers correspond to gradients calculated within $T_{\max}$; the left dashed-box shows the layer-wise aggregation with updated colored global model layers.
  • Figure 3: fl convergence profile, mlp trained on MNIST.
  • Figure 4: fl convergence profile, cnn trained on MNIST
  • Figure 5: Test set accuracy vs. latency constrains, cnn trained on MNIST.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Lemma 1
  • Lemma 2: Unbiasedness
  • Lemma 3: Bounded variance
  • Theorem 3.1