Table of Contents
Fetching ...

Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays

Charikleia Iakovidou, Kibaek Kim

TL;DR

This work tackles asynchronous federated learning under highly heterogeneous, non-IID data where traditional update schemes suffer bias from uneven participation. It introduces AREA, a memory-based, residual-update method that eliminates the need for client sampling and handles unbounded delays, while remaining compatible with secure aggregation. The authors prove optimal convergence rates: $\igO(1/K)$ for strongly convex and smooth objectives and $\igO(1/\sqrt{K})$ for convex, nonsmooth cases, with rates depending on average participation rather than extrema and a closed-form for optimal server update frequency under Poisson delays. Empirical results on MNIST confirm AREA’s fast convergence, robustness to outliers in participation, and superior final accuracy relative to several asynchronous FL baselines across varied delay patterns and local steps. Overall, AREA offers a practical, privacy-friendly, and theoretically solid solution for scalable federated optimization with arbitrary delays.

Abstract

Federated learning (FL) was recently proposed to securely train models with data held over multiple locations (``clients'') under the coordination of a central server. Prolonged training times caused by slow clients may hinder the performance of FL; while asynchronous communication is a promising solution, highly heterogeneous client response times under non-IID local data may introduce significant bias to the global model, particularly in client-driven setups where sampling is infeasible. To address this issue, we propose \underline{A}synch\underline{R}onous \underline{E}xact \underline{A}veraging (\textsc{AREA}), a stochastic (sub)gradient method that leverages asynchrony for scalability and uses client-side memory to correct the bias induced by uneven participation, without client sampling or prior knowledge of client latencies. \textsc{AREA} communicates model residuals rather than gradient estimates, reducing exposure to gradient inversion, and is compatible with secure aggregation. Under standard assumptions and unbounded, heterogeneous delays with finite mean, AREA achieves optimal convergence rates: $\mathcal{O}(1/K)$ in the strongly convex, smooth regime and $\mathcal{O}(1/\sqrt{K})$ in the convex, nonsmooth regime. For strongly convex, smooth objectives, we demonstrate theoretically and empirically that AREA accommodates larger step sizes than existing methods, enabling fast convergence without adversely impacting model generalization. In the convex, nonsmooth setting, to our knowledge we are the first to obtain rates that scale with the average client update frequency rather than the minimum or maximum, indicating increased robustness to outliers.

Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays

TL;DR

This work tackles asynchronous federated learning under highly heterogeneous, non-IID data where traditional update schemes suffer bias from uneven participation. It introduces AREA, a memory-based, residual-update method that eliminates the need for client sampling and handles unbounded delays, while remaining compatible with secure aggregation. The authors prove optimal convergence rates: for strongly convex and smooth objectives and for convex, nonsmooth cases, with rates depending on average participation rather than extrema and a closed-form for optimal server update frequency under Poisson delays. Empirical results on MNIST confirm AREA’s fast convergence, robustness to outliers in participation, and superior final accuracy relative to several asynchronous FL baselines across varied delay patterns and local steps. Overall, AREA offers a practical, privacy-friendly, and theoretically solid solution for scalable federated optimization with arbitrary delays.

Abstract

Federated learning (FL) was recently proposed to securely train models with data held over multiple locations (``clients'') under the coordination of a central server. Prolonged training times caused by slow clients may hinder the performance of FL; while asynchronous communication is a promising solution, highly heterogeneous client response times under non-IID local data may introduce significant bias to the global model, particularly in client-driven setups where sampling is infeasible. To address this issue, we propose \underline{A}synch\underline{R}onous \underline{E}xact \underline{A}veraging (\textsc{AREA}), a stochastic (sub)gradient method that leverages asynchrony for scalability and uses client-side memory to correct the bias induced by uneven participation, without client sampling or prior knowledge of client latencies. \textsc{AREA} communicates model residuals rather than gradient estimates, reducing exposure to gradient inversion, and is compatible with secure aggregation. Under standard assumptions and unbounded, heterogeneous delays with finite mean, AREA achieves optimal convergence rates: in the strongly convex, smooth regime and in the convex, nonsmooth regime. For strongly convex, smooth objectives, we demonstrate theoretically and empirically that AREA accommodates larger step sizes than existing methods, enabling fast convergence without adversely impacting model generalization. In the convex, nonsmooth setting, to our knowledge we are the first to obtain rates that scale with the average client update frequency rather than the minimum or maximum, indicating increased robustness to outliers.
Paper Structure (36 sections, 15 theorems, 139 equations, 2 figures, 10 tables, 2 algorithms)

This paper contains 36 sections, 15 theorems, 139 equations, 2 figures, 10 tables, 2 algorithms.

Key Result

Theorem 7

Under Assumptions assum:grad_stoch, assum:smooth and assum:strong_convex, let $x^\star = \arg \min_x f(x)$ and suppose that the step size sequence $\{\alpha_k\}$ in Algorithm alg:area_client is defined as follows $M$ is the number of local SGD steps, $\gamma \triangleq \frac{2\mu L}{\mu + L}$, $L$ and $\mu$ are the Lipschitz and strong-convexity constants defined in Assumptions assum:smooth and a

Figures (2)

  • Figure 1: Asynchronous FL with non-uniform client update frequencies.
  • Figure 2: Nearly-IID label distribution for 128 clients: clients 1–127 have identical proportions; client 128 contains the leftover samples.

Theorems & Definitions (32)

  • Remark 2
  • Example 3
  • Theorem 7
  • Theorem 8
  • Lemma 9
  • Lemma 10
  • proof
  • Lemma 11
  • proof
  • Lemma 12
  • ...and 22 more