Table of Contents
Fetching ...

FedSUM Family: Efficient Federated Learning Methods under Arbitrary Client Participation

Runze You, Shi Pu

TL;DR

The paper tackles federated learning under arbitrary client participation by introducing two delay metrics, τ_max and τ_avg, to quantify participation variability. It proposes the FedSUM family (FedSUM-B, FedSUM, FedSUM-CR) which uses Stochastic Uplink-Merge to integrate updates from intermittently active clients and counteract data heterogeneity. The authors provide unified nonconvex convergence guarantees that degrade gracefully with participation delays and demonstrate that the methods match or exceed the efficiency of existing baselines, with FedSUM-CR offering further communication savings. Empirical results on standard benchmarks with varying participation patterns validate the theoretical findings and show robust performance across heterogeneous settings and tasks, including NLP with SST-2 and image classification.

Abstract

Federated Learning (FL) methods are often designed for specific client participation patterns, limiting their applicability in practical deployments. We introduce the FedSUM family of algorithms, which supports arbitrary client participation without additional assumptions on data heterogeneity. Our framework models participation variability with two delay metrics, the maximum delay $τ_{\max}$ and the average delay $τ_{\text{avg}}$. The FedSUM family comprises three variants: FedSUM-B (basic version), FedSUM (standard version), and FedSUM-CR (communication-reduced version). We provide unified convergence guarantees demonstrating the effectiveness of our approach across diverse participation patterns, thereby broadening the applicability of FL in real-world scenarios.

FedSUM Family: Efficient Federated Learning Methods under Arbitrary Client Participation

TL;DR

The paper tackles federated learning under arbitrary client participation by introducing two delay metrics, τ_max and τ_avg, to quantify participation variability. It proposes the FedSUM family (FedSUM-B, FedSUM, FedSUM-CR) which uses Stochastic Uplink-Merge to integrate updates from intermittently active clients and counteract data heterogeneity. The authors provide unified nonconvex convergence guarantees that degrade gracefully with participation delays and demonstrate that the methods match or exceed the efficiency of existing baselines, with FedSUM-CR offering further communication savings. Empirical results on standard benchmarks with varying participation patterns validate the theoretical findings and show robust performance across heterogeneous settings and tasks, including NLP with SST-2 and image classification.

Abstract

Federated Learning (FL) methods are often designed for specific client participation patterns, limiting their applicability in practical deployments. We introduce the FedSUM family of algorithms, which supports arbitrary client participation without additional assumptions on data heterogeneity. Our framework models participation variability with two delay metrics, the maximum delay and the average delay . The FedSUM family comprises three variants: FedSUM-B (basic version), FedSUM (standard version), and FedSUM-CR (communication-reduced version). We provide unified convergence guarantees demonstrating the effectiveness of our approach across diverse participation patterns, thereby broadening the applicability of FL in real-world scenarios.

Paper Structure

This paper contains 29 sections, 32 theorems, 193 equations, 14 figures, 2 tables, 3 algorithms.

Key Result

Theorem 4.1

Suppose Assumptions a.smooth and a.var hold. Under an arbitrary client participation sequence $\left\{{\mathcal{S}}_t\right\}_{t=0}^{T-1}$ characterized by $\tau_{\max}$ and $\tau_{\text{avg}}$, suppose the learning rates for FedSUM-B, FedSUM, and FedSUM-CR are set as Then all three algorithms achieve the following convergence rate: where $\Delta_f := f(x^{(0)}) - f^*$ and $F_0 := \frac{1}{N}\su

Figures (14)

  • Figure 1: Training loss and test accuracy curves for CNN models on three datasets, comparing different FL algorithms under various client participation patterns. The performance is evaluated against (a) the number of communication rounds and (b) the cumulative communication workload. For the workload, one unit corresponds to the transmission of a full-sized model.
  • Figure 2: Illustration of the Stochastic Uplink-Merge (SUM) technique in addressing data heterogeneity and participation bias issue during server's model updates.
  • Figure 3: Illustration of the usage of the correction direction $y_i^{t}$ in addressing data heterogeneity during client's local model updates.
  • Figure 4: Data heterogeneity with Dirichlet($\alpha=0.1$) distribution across 100 clients. The x-axis is the client index, and the y-axis is the number of samples. The color bars represent the proportion of each label. Smaller $\alpha$ leads to more non-i.i.d. data.
  • Figure 5: Performance of the evaluated algorithms using CNN models on three datasets under client participation pattern P1.
  • ...and 9 more figures

Theorems & Definitions (59)

  • Remark 2.1
  • Theorem 4.1
  • Corollary 4.1
  • Remark 4.1
  • Remark 4.2
  • Lemma B.1
  • proof
  • Lemma B.2
  • proof
  • Lemma C.1
  • ...and 49 more