Sharp Bounds for Sequential Federated Learning on Heterogeneous Data

Yipeng Li; Xinchen Lyu

Sharp Bounds for Sequential Federated Learning on Heterogeneous Data

Yipeng Li, Xinchen Lyu

TL;DR

This work provides sharp convergence guarantees for sequential federated learning (SFL) on heterogeneous data by establishing upper bounds for strongly convex, general convex, and non-convex objectives and matching lower bounds for the convex cases. By introducing and analyzing the effective learning rate tilde{\eta}=\eta MK and the two-learning-rate mechanism, it demonstrates that SFL can outperform parallel FL (PFL) under heterogeneity, with tight bounds that align with stochasticity and heterogeneity terms. The authors verify theoretical findings through experiments on quadratic functions, logistic regression, and deep neural networks, showing counterintuitive advantages of SFL in substantial heterogeneity settings. The results illuminate fundamental trade-offs between optimization and error terms and provide guidance for choosing learning rates in sequential federated workflows. Overall, the paper advances the theoretical understanding of SFL and reinforces its practical relevance for decentralized learning on non-iid data.

Abstract

There are two paradigms in Federated Learning (FL): parallel FL (PFL), where models are trained in a parallel manner across clients, and sequential FL (SFL), where models are trained in a sequential manner across clients. Specifically, in PFL, clients perform local updates independently and send the updated model parameters to a global server for aggregation; in SFL, one client starts its local updates only after receiving the model parameters from the previous client in the sequence. In contrast to that of PFL, the convergence theory of SFL on heterogeneous data is still lacking. To resolve the theoretical dilemma of SFL, we establish sharp convergence guarantees for SFL on heterogeneous data with both upper and lower bounds. Specifically, we derive the upper bounds for the strongly convex, general convex and non-convex objective functions, and construct the matching lower bounds for the strongly convex and general convex objective functions. Then, we compare the upper bounds of SFL with those of PFL, showing that SFL outperforms PFL on heterogeneous data (at least, when the level of heterogeneity is relatively high). Experimental results validate the counterintuitive theoretical finding.

Sharp Bounds for Sequential Federated Learning on Heterogeneous Data

TL;DR

Abstract

Paper Structure (50 sections, 18 theorems, 144 equations, 5 figures, 7 tables, 4 algorithms)

This paper contains 50 sections, 18 theorems, 144 equations, 5 figures, 7 tables, 4 algorithms.

Introduction
Related Work
Setup
Convergence Analysis of SFL
Assumptions
Upper Bounds of SFL
Lower Bounds of SFL
Comparison Between PFL and SFL
Comparison under Assumption \ref{['asm:heterogeneity:optimum']}
Comparison under Assumption \ref{['asm:heterogeneity:max']}
Experiments
Experiments on Quadratic Functions
Experiments on Logistic Regression
Experiments on Deep Neural Networks
Conclusion
...and 35 more sections

Key Result

Theorem 3

Let all the local objectives be $L$-smooth (Definition def:smoothness). For SFL (Algorithm algorithm1), there exist a constant effective learning rate $\tilde{\eta} \coloneqq \eta MK$ and weights $\{w_r\}_{r\geq 0}$, such that the weighted average of the global model parameters $\bar{{\mathbf{x}}}^{ Here $D\coloneqq\left\lVert x^{(0)}-x^\ast\right\rVert$ for the convex cases and $A \coloneqq F({\m

Figures (5)

Figure 1: Illustration of SFL and PFL.
Figure 2: Results of the experiments on quadratic functions. It displays the experimental results of ten groups in Table \ref{['tab:simulation settings']}. The top (bottom) row shows the first (last) five groups from left to right. We set $K=10$. The shaded areas show the min-max values across 10 random seeds.
Figure 3: Training loss results of PFL and SFL. The top row shows the results when $\omega = 0.0$ and the bottom row shows the results when $\omega = 0.0001$. The shaded areas show the min-max values across 10 random seeds.
Figure 4: Test accuracy results of PFL and SFL on CIFAR-10. For visualization, we apply moving average over a window length of 5 data points. The shaded areas show the standard deviation across 3 random seeds.
Figure 5: The mechanism of "two learning rates" in SFL and PFL. The global updates of SFL are performed at the last client. It performs the global updates with its parameters ${\mathbf{x}}_{M,K}^{(r)}$ and the initial parameters ${\mathbf{x}}^{(r)}$ received from the first client.

Theorems & Definitions (20)

Definition 1
Definition 2
Theorem 3
Corollary 4
Theorem 5
Theorem 6
Theorem 7
Lemma 8
Lemma 9
Lemma 10
...and 10 more

Sharp Bounds for Sequential Federated Learning on Heterogeneous Data

TL;DR

Abstract

Sharp Bounds for Sequential Federated Learning on Heterogeneous Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (20)