Table of Contents
Fetching ...

Non-Convex Federated Optimization under Cost-Aware Client Selection

Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich

TL;DR

The paper introduces a cost-aware model for federated optimization that differentiates client-selection strategies by their communication costs and defines information-based complexities for fair comparison. It then develops the Inexact Composite Gradient Method with Recursive Gradient estimators (RG-SAGA, RG-SVRG), deriving new variance bounds that tie accuracy to the dissimilarity constant δ rather than individual smoothness, and achieves the best-known nonconvex optimization complexities under partial participation. The Recursive Gradient framework unifies several variance-reduction techniques (e.g., SARAH, STORM) and enables efficient communication-local computation tradeoffs in FL. Empirical results on synthetic problems and real-world deep learning tasks (EMNIST, CIFAR-10) illustrate significant gains in communication and computation efficiency, especially when global synchronization is costly. The work opens avenues for extensions including multiple delegates, stochastic/higher-order oracles, and communication compression within cost-aware federated settings.

Abstract

Different federated optimization algorithms typically employ distinct client-selection strategies: some methods communicate only with a randomly sampled subset of clients at each round, while others need to periodically communicate with all clients or use a hybrid scheme that combines both strategies. However, existing metrics for comparing optimization methods typically do not distinguish between these strategies, which often incur different communication costs in practice. To address this disparity, we introduce a simple and natural model of federated optimization that quantifies communication and local computation complexities. This new model allows for several commonly used client-selection strategies and explicitly associates each with a distinct cost. Within this setting, we propose a new algorithm that achieves the best-known communication and local complexities among existing federated optimization methods for non-convex optimization. This algorithm is based on the inexact composite gradient method with a carefully constructed gradient estimator and a special procedure for solving the auxiliary subproblem at each iteration. The gradient estimator is based on SAGA, a popular variance-reduced gradient estimator. We first derive a new variance bound for it, showing that SAGA can exploit functional similarity. We then introduce the Recursive-Gradient technique as a general way to potentially improve the error bound of a given conditionally unbiased gradient estimator, including both SAGA and SVRG. By applying this technique to SAGA, we obtain a new estimator, RG-SAGA, which has an improved error bound compared to the original one.

Non-Convex Federated Optimization under Cost-Aware Client Selection

TL;DR

The paper introduces a cost-aware model for federated optimization that differentiates client-selection strategies by their communication costs and defines information-based complexities for fair comparison. It then develops the Inexact Composite Gradient Method with Recursive Gradient estimators (RG-SAGA, RG-SVRG), deriving new variance bounds that tie accuracy to the dissimilarity constant δ rather than individual smoothness, and achieves the best-known nonconvex optimization complexities under partial participation. The Recursive Gradient framework unifies several variance-reduction techniques (e.g., SARAH, STORM) and enables efficient communication-local computation tradeoffs in FL. Empirical results on synthetic problems and real-world deep learning tasks (EMNIST, CIFAR-10) illustrate significant gains in communication and computation efficiency, especially when global synchronization is costly. The work opens avenues for extensions including multiple delegates, stochastic/higher-order oracles, and communication compression within cost-aware federated settings.

Abstract

Different federated optimization algorithms typically employ distinct client-selection strategies: some methods communicate only with a randomly sampled subset of clients at each round, while others need to periodically communicate with all clients or use a hybrid scheme that combines both strategies. However, existing metrics for comparing optimization methods typically do not distinguish between these strategies, which often incur different communication costs in practice. To address this disparity, we introduce a simple and natural model of federated optimization that quantifies communication and local computation complexities. This new model allows for several commonly used client-selection strategies and explicitly associates each with a distinct cost. Within this setting, we propose a new algorithm that achieves the best-known communication and local complexities among existing federated optimization methods for non-convex optimization. This algorithm is based on the inexact composite gradient method with a carefully constructed gradient estimator and a special procedure for solving the auxiliary subproblem at each iteration. The gradient estimator is based on SAGA, a popular variance-reduced gradient estimator. We first derive a new variance bound for it, showing that SAGA can exploit functional similarity. We then introduce the Recursive-Gradient technique as a general way to potentially improve the error bound of a given conditionally unbiased gradient estimator, including both SAGA and SVRG. By applying this technique to SAGA, we obtain a new estimator, RG-SAGA, which has an improved error bound compared to the original one.

Paper Structure

This paper contains 44 sections, 22 theorems, 142 equations, 12 figures, 5 tables.

Key Result

Theorem 3.1

Let Alg:PP be applied to Problem eq:problem. Suppose Assumption assump:ED and condition eq:Condition-SD are satisfied. Let $\lambda > \Delta_1$. Then for any $T \ge 1$, we have:

Figures (12)

  • Figure 1: Illustration of the sequence of procedures performed by a federated optimization algorithm at each communication round $r$. Each client $i \in S_r$ can make different number of local steps.
  • Figure 2: Comparisons of different algorithms for solving the quadratic minimization problems with non-convex log-sum penalty.
  • Figure 3: Comparisons of different algorithms on two LIBSVM datasets using logistic loss with non-convex regularizer.
  • Figure E.1: Comparisons of different initialization strategies of I-CGM-RG-SAGA for solving the quadratic minimization problems with non-convex log-sum penalty.
  • Figure E.2: Comparisons of different $p$ (number of local steps) used in local CGM for I-CGM-RG-SAGA when solving the quadratic minimization problems with non-convex log-sum penalty.
  • ...and 7 more figures

Theorems & Definitions (43)

  • Theorem 3.1
  • Corollary 3.2
  • Lemma 3.3
  • Lemma 3.4
  • Lemma 4.1
  • Remark 4.2
  • Lemma 4.3
  • Lemma 5.2: Error bound for \ref{['Alg:RG-update']}
  • Corollary 5.3
  • Corollary 5.4
  • ...and 33 more