Convergence Analysis of Sequential Federated Learning on Heterogeneous Data

Yipeng Li; Xinchen Lyu

Convergence Analysis of Sequential Federated Learning on Heterogeneous Data

Yipeng Li, Xinchen Lyu

TL;DR

This work analyzes the convergence of sequential federated learning (SFL) on heterogeneous data, establishing guarantees for strongly convex, general convex, and non-convex objectives. By introducing an effective learning rate $\tilde{\eta}=\eta MK$ and carefully bounding stochasticity and heterogeneity, it derives upper bounds on $\mathbb{E}[F(\bar{x}^{(R)})-F(x^*)]$ that reveal when SFL outperforms parallel FL (PFL) under data non-i.i.d. settings. The results are complemented by experiments on quadratic functions and real datasets with Extended Dirichlet-based heterogeneity, validating the counterintuitive finding that SFL can surpass PFL in extremely heterogeneous cross-device scenarios. The findings have practical implications for choosing sequential versus parallel training in FL and tuning the local update count $K$ to balance optimization progress and heterogeneity error terms.

Abstract

There are two categories of methods in Federated Learning (FL) for joint training across multiple clients: i) parallel FL (PFL), where clients train models in a parallel manner; and ii) sequential FL (SFL), where clients train models in a sequential manner. In contrast to that of PFL, the convergence theory of SFL on heterogeneous data is still lacking. In this paper, we establish the convergence guarantees of SFL for strongly/general/non-convex objectives on heterogeneous data. The convergence guarantees of SFL are better than that of PFL on heterogeneous data with both full and partial client participation. Experimental results validate the counterintuitive analysis result that SFL outperforms PFL on extremely heterogeneous data in cross-device settings.

Convergence Analysis of Sequential Federated Learning on Heterogeneous Data

TL;DR

and carefully bounding stochasticity and heterogeneity, it derives upper bounds on

that reveal when SFL outperforms parallel FL (PFL) under data non-i.i.d. settings. The results are complemented by experiments on quadratic functions and real datasets with Extended Dirichlet-based heterogeneity, validating the counterintuitive finding that SFL can surpass PFL in extremely heterogeneous cross-device scenarios. The findings have practical implications for choosing sequential versus parallel training in FL and tuning the local update count

to balance optimization progress and heterogeneity error terms.

Abstract

Paper Structure (36 sections, 8 theorems, 80 equations, 5 figures, 5 tables, 4 algorithms)

This paper contains 36 sections, 8 theorems, 80 equations, 5 figures, 5 tables, 4 algorithms.

Introduction
Motivation.
Setup.
Contributions
Brief literature review.
Challenges.
Contributions.
Convergence theory
Assumptions
Convergence analysis of SFL
PFL vs. SFL on heterogeneous data
Experiments
Experiments on quadratic functions
Experiments on real datasets
Extended Dirichlet strategy.
...and 21 more sections

Key Result

Theorem 1

Let all the local objectives be $L$-smooth (Assumption asm:smoothness). For SFL (Algorithm algorithm1), there exist a constant effective learning rate $\tilde{\eta} \coloneqq \eta MK$ and weights $\{w_r\}_{r\geq 0}$, such that the weighted average of the global parameters $\bar{{\mathbf{x}}}^{(R)}\c Here $D\coloneqq\left\lVert x^{(0)}-x^\ast\right\rVert$ for the convex cases and $A \coloneqq F({\m

Figures (5)

Figure 1: Simulations on quadratic functions. It displays the experimental results from Group 1 to Group 4 in Table \ref{['tab:simulation settings']} from left to right. Shaded areas show the min-max values.
Figure 2: Test accuracies after training VGG-9 on CIFAR-10 for 1000 training rounds with different learning rates.
Figure 4: Overviews of paradigms in FL and SL. The top row shows the FL algorithms, SFL and PFL. The bottom row shows the SL algorithms, SSL and SplitFed.
Figure 5: Illustration of FL with cyclic client participation with $M = 12$ clients divided into $\overline K = 3$ groups. In each training round, $N = 2$ clients are selected for training from the client group. All groups are traversed once in a cycle-epoch consisting of $\overline K$ training rounds. cho2023convergence.
Figure 6: Visualization of FL with shuffling client participation for 6 clients, each with 3 datapoints. Two clients are sampled in each communication round malinovsky2023federated.

Theorems & Definitions (14)

Theorem 1
Corollary 1
Lemma 1: karimireddy2020scaffold
proof
Lemma 2: karimireddy2020scaffold
proof
Lemma 3: Simple Random Sampling
proof
Lemma 4
proof
...and 4 more

Convergence Analysis of Sequential Federated Learning on Heterogeneous Data

TL;DR

Abstract

Convergence Analysis of Sequential Federated Learning on Heterogeneous Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)