Table of Contents
Fetching ...

Exact and Linear Convergence for Federated Learning under Arbitrary Client Participation is Attainable

Bicheng Ying, Zhe Li, Haibo Yang

TL;DR

This paper addresses the fundamental FL challenge of exact convergence under arbitrary client participation and data heterogeneity. It introduces a stochastic-matrix and time-varying-graph framework to model participation and local updates, and reformulates FL as a constrained optimization solved by a push-pull strategy (FOCUS). The authors prove that FOCUS achieves exact convergence with a linear rate for both strongly convex and PL-condition nonconvex cases, without decaying the learning rate, and extend the framework to SG-FOCUS for stochastic gradients. They also provide an interpretation of FedAvg within this decentralized perspective and demonstrate the practical viability through theoretical rates and supporting experiments. The work establishes a principled connection between FL and decentralized optimization, offering a scalable path to exact convergence under arbitrary participation patterns.

Abstract

This work tackles the fundamental challenges in Federated Learning (FL) posed by arbitrary client participation and data heterogeneity, prevalent characteristics in practical FL settings. It is well-established that popular FedAvg-style algorithms struggle with exact convergence and can suffer from slow convergence rates since a decaying learning rate is required to mitigate these scenarios. To address these issues, we introduce the concept of stochastic matrix and the corresponding time-varying graphs as a novel modeling tool to accurately capture the dynamics of arbitrary client participation and the local update procedure. Leveraging this approach, we offer a fresh decentralized perspective on designing FL algorithms and present FOCUS, Federated Optimization with Exact Convergence via Push-pull Strategy, a provably convergent algorithm designed to effectively overcome the previously mentioned two challenges. More specifically, we provide a rigorous proof demonstrating that FOCUS achieves exact convergence with a linear rate regardless of the arbitrary client participation, establishing it as the first work to demonstrate this significant result.

Exact and Linear Convergence for Federated Learning under Arbitrary Client Participation is Attainable

TL;DR

This paper addresses the fundamental FL challenge of exact convergence under arbitrary client participation and data heterogeneity. It introduces a stochastic-matrix and time-varying-graph framework to model participation and local updates, and reformulates FL as a constrained optimization solved by a push-pull strategy (FOCUS). The authors prove that FOCUS achieves exact convergence with a linear rate for both strongly convex and PL-condition nonconvex cases, without decaying the learning rate, and extend the framework to SG-FOCUS for stochastic gradients. They also provide an interpretation of FedAvg within this decentralized perspective and demonstrate the practical viability through theoretical rates and supporting experiments. The work establishes a principled connection between FL and decentralized optimization, offering a scalable path to exact convergence under arbitrary participation patterns.

Abstract

This work tackles the fundamental challenges in Federated Learning (FL) posed by arbitrary client participation and data heterogeneity, prevalent characteristics in practical FL settings. It is well-established that popular FedAvg-style algorithms struggle with exact convergence and can suffer from slow convergence rates since a decaying learning rate is required to mitigate these scenarios. To address these issues, we introduce the concept of stochastic matrix and the corresponding time-varying graphs as a novel modeling tool to accurately capture the dynamics of arbitrary client participation and the local update procedure. Leveraging this approach, we offer a fresh decentralized perspective on designing FL algorithms and present FOCUS, Federated Optimization with Exact Convergence via Push-pull Strategy, a provably convergent algorithm designed to effectively overcome the previously mentioned two challenges. More specifically, we provide a rigorous proof demonstrating that FOCUS achieves exact convergence with a linear rate regardless of the arbitrary client participation, establishing it as the first work to demonstrate this significant result.

Paper Structure

This paper contains 41 sections, 15 theorems, 153 equations, 9 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Under arbitrary participation assumption assumption.arb.acti and $L-$Smoothness assumption assumption.l-smooth, it can be proved that FOCUS converges at the following rates with various extra assumptions on $f_i$: where the Lyapunov functions $\Psi_r := \mathbb{E}\space \|\bar{x}_{r\tau+1} - x^\star \|^2 + (1-8\eta \tau LN ) \mathbb{E}\space \|\mathds{1}\bar{x}_{(r-1)\tau+1} - {\boldsymbol{x}}_{r

Figures (9)

  • Figure 1: The graph representation of the communication pattern of 5 nodes and its possible corresponding stochastic matrices. For clearness, the self-loop is not drawn. If the node 0 is treated as server and node 1 to 4 as clients, the leftmost is a typical pull-model step, i.e. client 1 and 3 are participated; the second left graph depicts the model average step in the FedAvg; the third graph is a same graph but using column-stochastic matrix, which is uncommon in the FL literature; The last one is a typical (symmetric) doubly stochastic matrix case used in the decentralized optimization algorithm.
  • Figure 2: Represent FedAvg using graphs. The dashed line means no communication.
  • Figure 3: Illustration of our new FOCUS algorithm. There are two key differences from FedAvg style algorithm. One is it pulls the model variable ${\boldsymbol{x}}$ but pushes the gradient variable ${\boldsymbol{y}}$, and another is the push matrix is the column stochastic matrix instead of the row stochastic.
  • Figure 4: Convergence performance comparison of various FL algorithms. Under full client participation, FedAvg, FedAU, and MIFA exhibit identical performance, as do SCAFFOLD and ProxSkip, due to their theoretical equivalence in this setting. FedAvg and FedAU fail to converge to the optimal solution across all scenarios because their inherent error and bias cannot be eliminated using a fixed learning rate. ProxSkip diverges under uniform and arbitrary participation, as it is not designed for these conditions. We do not understand why MIFA diverges but it works in ML applications. While SCAFFOLD converges in all cases, our proposed algorithm, FOCUS, demonstrates faster convergence, especially under arbitrary participation.
  • Figure 5: An illustration of Decentralized Gradient Descent.
  • ...and 4 more figures

Theorems & Definitions (15)

  • Theorem 1
  • Theorem 2: Informal Convergence Theorem of SG-FOCUS
  • Theorem 3: Convergence of FedAvg Under Arbitrary Activation
  • Lemma 1: Descent Lemma of FedAvg
  • Lemma 2: Consensus Error of FedAvg
  • Corollary 1: FedAvg Under the Uniform Sampling
  • Corollary 2: FedAvg Under the Uniform Sampling and Single Local Update
  • Corollary 3: FedAvg with Homogeneous Functions
  • Lemma 3: Descent Lemma for FOCUS
  • Lemma 4: Consensus Lemma for FOCUS
  • ...and 5 more