Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays

Charikleia Iakovidou; Kibaek Kim

Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays

Charikleia Iakovidou, Kibaek Kim

TL;DR

This work tackles asynchronous federated learning under highly heterogeneous, non-IID data where traditional update schemes suffer bias from uneven participation. It introduces AREA, a memory-based, residual-update method that eliminates the need for client sampling and handles unbounded delays, while remaining compatible with secure aggregation. The authors prove optimal convergence rates: $\igO(1/K)$ for strongly convex and smooth objectives and $\igO(1/\sqrt{K})$ for convex, nonsmooth cases, with rates depending on average participation rather than extrema and a closed-form for optimal server update frequency under Poisson delays. Empirical results on MNIST confirm AREA’s fast convergence, robustness to outliers in participation, and superior final accuracy relative to several asynchronous FL baselines across varied delay patterns and local steps. Overall, AREA offers a practical, privacy-friendly, and theoretically solid solution for scalable federated optimization with arbitrary delays.

Abstract

Federated learning (FL) was recently proposed to securely train models with data held over multiple locations (``clients'') under the coordination of a central server. Prolonged training times caused by slow clients may hinder the performance of FL; while asynchronous communication is a promising solution, highly heterogeneous client response times under non-IID local data may introduce significant bias to the global model, particularly in client-driven setups where sampling is infeasible. To address this issue, we propose \underline{A}synch\underline{R}onous \underline{E}xact \underline{A}veraging (\textsc{AREA}), a stochastic (sub)gradient method that leverages asynchrony for scalability and uses client-side memory to correct the bias induced by uneven participation, without client sampling or prior knowledge of client latencies. \textsc{AREA} communicates model residuals rather than gradient estimates, reducing exposure to gradient inversion, and is compatible with secure aggregation. Under standard assumptions and unbounded, heterogeneous delays with finite mean, AREA achieves optimal convergence rates: $\mathcal{O}(1/K)$ in the strongly convex, smooth regime and $\mathcal{O}(1/\sqrt{K})$ in the convex, nonsmooth regime. For strongly convex, smooth objectives, we demonstrate theoretically and empirically that AREA accommodates larger step sizes than existing methods, enabling fast convergence without adversely impacting model generalization. In the convex, nonsmooth setting, to our knowledge we are the first to obtain rates that scale with the average client update frequency rather than the minimum or maximum, indicating increased robustness to outliers.

Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays

TL;DR

for strongly convex and smooth objectives and

for convex, nonsmooth cases, with rates depending on average participation rather than extrema and a closed-form for optimal server update frequency under Poisson delays. Empirical results on MNIST confirm AREA’s fast convergence, robustness to outliers in participation, and superior final accuracy relative to several asynchronous FL baselines across varied delay patterns and local steps. Overall, AREA offers a practical, privacy-friendly, and theoretically solid solution for scalable federated optimization with arbitrary delays.

Abstract

in the strongly convex, smooth regime and

in the convex, nonsmooth regime. For strongly convex, smooth objectives, we demonstrate theoretically and empirically that AREA accommodates larger step sizes than existing methods, enabling fast convergence without adversely impacting model generalization. In the convex, nonsmooth setting, to our knowledge we are the first to obtain rates that scale with the average client update frequency rather than the minimum or maximum, indicating increased robustness to outliers.

Paper Structure (36 sections, 15 theorems, 139 equations, 2 figures, 10 tables, 2 algorithms)

This paper contains 36 sections, 15 theorems, 139 equations, 2 figures, 10 tables, 2 algorithms.

Introduction
Contributions.
Related Work
Asynchronous FL.
AREA: An Asynchronous Method for Federated Stochastic Optimization
Theoretical Results
Results for strongly convex and smooth functions
Dependency on $p_{\min}$.
Two-phase behavior.
Synchronization error.
Comparison with existing methods.
Results for convex and nonsmooth functions
Comparison with existing methods.
Optimal tuning of the server policy.
Numerical Results
...and 21 more sections

Key Result

Theorem 7

Under Assumptions assum:grad_stoch, assum:smooth and assum:strong_convex, let $x^\star = \arg \min_x f(x)$ and suppose that the step size sequence $\{\alpha_k\}$ in Algorithm alg:area_client is defined as follows $M$ is the number of local SGD steps, $\gamma \triangleq \frac{2\mu L}{\mu + L}$, $L$ and $\mu$ are the Lipschitz and strong-convexity constants defined in Assumptions assum:smooth and a

Figures (2)

Figure 1: Asynchronous FL with non-uniform client update frequencies.
Figure 2: Nearly-IID label distribution for 128 clients: clients 1–127 have identical proportions; client 128 contains the leftover samples.

Theorems & Definitions (32)

Remark 2
Example 3
Theorem 7
Theorem 8
Lemma 9
Lemma 10
proof
Lemma 11
proof
Lemma 12
...and 22 more

Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays

TL;DR

Abstract

Asynchronous Federated Stochastic Optimization for Heterogeneous Objectives Under Arbitrary Delays

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (32)