Table of Contents
Fetching ...

Sharp Bounds for Sequential Federated Learning on Heterogeneous Data

Yipeng Li, Xinchen Lyu

TL;DR

This work provides sharp convergence guarantees for sequential federated learning (SFL) on heterogeneous data by establishing upper bounds for strongly convex, general convex, and non-convex objectives and matching lower bounds for the convex cases. By introducing and analyzing the effective learning rate tilde{\eta}=\eta MK and the two-learning-rate mechanism, it demonstrates that SFL can outperform parallel FL (PFL) under heterogeneity, with tight bounds that align with stochasticity and heterogeneity terms. The authors verify theoretical findings through experiments on quadratic functions, logistic regression, and deep neural networks, showing counterintuitive advantages of SFL in substantial heterogeneity settings. The results illuminate fundamental trade-offs between optimization and error terms and provide guidance for choosing learning rates in sequential federated workflows. Overall, the paper advances the theoretical understanding of SFL and reinforces its practical relevance for decentralized learning on non-iid data.

Abstract

There are two paradigms in Federated Learning (FL): parallel FL (PFL), where models are trained in a parallel manner across clients, and sequential FL (SFL), where models are trained in a sequential manner across clients. Specifically, in PFL, clients perform local updates independently and send the updated model parameters to a global server for aggregation; in SFL, one client starts its local updates only after receiving the model parameters from the previous client in the sequence. In contrast to that of PFL, the convergence theory of SFL on heterogeneous data is still lacking. To resolve the theoretical dilemma of SFL, we establish sharp convergence guarantees for SFL on heterogeneous data with both upper and lower bounds. Specifically, we derive the upper bounds for the strongly convex, general convex and non-convex objective functions, and construct the matching lower bounds for the strongly convex and general convex objective functions. Then, we compare the upper bounds of SFL with those of PFL, showing that SFL outperforms PFL on heterogeneous data (at least, when the level of heterogeneity is relatively high). Experimental results validate the counterintuitive theoretical finding.

Sharp Bounds for Sequential Federated Learning on Heterogeneous Data

TL;DR

This work provides sharp convergence guarantees for sequential federated learning (SFL) on heterogeneous data by establishing upper bounds for strongly convex, general convex, and non-convex objectives and matching lower bounds for the convex cases. By introducing and analyzing the effective learning rate tilde{\eta}=\eta MK and the two-learning-rate mechanism, it demonstrates that SFL can outperform parallel FL (PFL) under heterogeneity, with tight bounds that align with stochasticity and heterogeneity terms. The authors verify theoretical findings through experiments on quadratic functions, logistic regression, and deep neural networks, showing counterintuitive advantages of SFL in substantial heterogeneity settings. The results illuminate fundamental trade-offs between optimization and error terms and provide guidance for choosing learning rates in sequential federated workflows. Overall, the paper advances the theoretical understanding of SFL and reinforces its practical relevance for decentralized learning on non-iid data.

Abstract

There are two paradigms in Federated Learning (FL): parallel FL (PFL), where models are trained in a parallel manner across clients, and sequential FL (SFL), where models are trained in a sequential manner across clients. Specifically, in PFL, clients perform local updates independently and send the updated model parameters to a global server for aggregation; in SFL, one client starts its local updates only after receiving the model parameters from the previous client in the sequence. In contrast to that of PFL, the convergence theory of SFL on heterogeneous data is still lacking. To resolve the theoretical dilemma of SFL, we establish sharp convergence guarantees for SFL on heterogeneous data with both upper and lower bounds. Specifically, we derive the upper bounds for the strongly convex, general convex and non-convex objective functions, and construct the matching lower bounds for the strongly convex and general convex objective functions. Then, we compare the upper bounds of SFL with those of PFL, showing that SFL outperforms PFL on heterogeneous data (at least, when the level of heterogeneity is relatively high). Experimental results validate the counterintuitive theoretical finding.
Paper Structure (50 sections, 18 theorems, 144 equations, 5 figures, 7 tables, 4 algorithms)

This paper contains 50 sections, 18 theorems, 144 equations, 5 figures, 7 tables, 4 algorithms.

Key Result

Theorem 3

Let all the local objectives be $L$-smooth (Definition def:smoothness). For SFL (Algorithm algorithm1), there exist a constant effective learning rate $\tilde{\eta} \coloneqq \eta MK$ and weights $\{w_r\}_{r\geq 0}$, such that the weighted average of the global model parameters $\bar{{\mathbf{x}}}^{ Here $D\coloneqq\left\lVert x^{(0)}-x^\ast\right\rVert$ for the convex cases and $A \coloneqq F({\m

Figures (5)

  • Figure 1: Illustration of SFL and PFL.
  • Figure 2: Results of the experiments on quadratic functions. It displays the experimental results of ten groups in Table \ref{['tab:simulation settings']}. The top (bottom) row shows the first (last) five groups from left to right. We set $K=10$. The shaded areas show the min-max values across 10 random seeds.
  • Figure 3: Training loss results of PFL and SFL. The top row shows the results when $\omega = 0.0$ and the bottom row shows the results when $\omega = 0.0001$. The shaded areas show the min-max values across 10 random seeds.
  • Figure 4: Test accuracy results of PFL and SFL on CIFAR-10. For visualization, we apply moving average over a window length of 5 data points. The shaded areas show the standard deviation across 3 random seeds.
  • Figure 5: The mechanism of "two learning rates" in SFL and PFL. The global updates of SFL are performed at the last client. It performs the global updates with its parameters ${\mathbf{x}}_{M,K}^{(r)}$ and the initial parameters ${\mathbf{x}}^{(r)}$ received from the first client.

Theorems & Definitions (20)

  • Definition 1
  • Definition 2
  • Theorem 3
  • Corollary 4
  • Theorem 5
  • Theorem 6
  • Theorem 7
  • Lemma 8
  • Lemma 9
  • Lemma 10
  • ...and 10 more