Table of Contents
Fetching ...

Tighter Performance Theory of FedExProx

Wojciech Anyszka, Kaja Gruntkowska, Alexander Tyurin, Peter Richtárik

TL;DR

A novel analysis framework is developed, establishing a tighter linear convergence rate for non-strongly convex quadratic problems and extending the applicability of the analysis to general functions satisfying the Polyak-Lojasiewicz condition, outperforming the previous strongly convex analysis while operating under weaker assumptions.

Abstract

We revisit FedExProx - a recently proposed distributed optimization method designed to enhance convergence properties of parallel proximal algorithms via extrapolation. In the process, we uncover a surprising flaw: its known theoretical guarantees on quadratic optimization tasks are no better than those offered by the vanilla Gradient Descent (GD) method. Motivated by this observation, we develop a novel analysis framework, establishing a tighter linear convergence rate for non-strongly convex quadratic problems. By incorporating both computation and communication costs, we demonstrate that FedExProx can indeed provably outperform GD, in stark contrast to the original analysis. Furthermore, we consider partial participation scenarios and analyze two adaptive extrapolation strategies - based on gradient diversity and Polyak stepsizes - again significantly outperforming previous results. Moving beyond quadratics, we extend the applicability of our analysis to general functions satisfying the Polyak-Lojasiewicz condition, outperforming the previous strongly convex analysis while operating under weaker assumptions. Backed by empirical results, our findings point to a new and stronger potential of FedExProx, paving the way for further exploration of the benefits of extrapolation in federated learning.

Tighter Performance Theory of FedExProx

TL;DR

A novel analysis framework is developed, establishing a tighter linear convergence rate for non-strongly convex quadratic problems and extending the applicability of the analysis to general functions satisfying the Polyak-Lojasiewicz condition, outperforming the previous strongly convex analysis while operating under weaker assumptions.

Abstract

We revisit FedExProx - a recently proposed distributed optimization method designed to enhance convergence properties of parallel proximal algorithms via extrapolation. In the process, we uncover a surprising flaw: its known theoretical guarantees on quadratic optimization tasks are no better than those offered by the vanilla Gradient Descent (GD) method. Motivated by this observation, we develop a novel analysis framework, establishing a tighter linear convergence rate for non-strongly convex quadratic problems. By incorporating both computation and communication costs, we demonstrate that FedExProx can indeed provably outperform GD, in stark contrast to the original analysis. Furthermore, we consider partial participation scenarios and analyze two adaptive extrapolation strategies - based on gradient diversity and Polyak stepsizes - again significantly outperforming previous results. Moving beyond quadratics, we extend the applicability of our analysis to general functions satisfying the Polyak-Lojasiewicz condition, outperforming the previous strongly convex analysis while operating under weaker assumptions. Backed by empirical results, our findings point to a new and stronger potential of FedExProx, paving the way for further exploration of the benefits of extrapolation in federated learning.

Paper Structure

This paper contains 38 sections, 31 theorems, 179 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Theorem 3.2

Let Assumptions ass:inter and ass:local_compl hold. Consider solving a non-strongly convex quadratic optimization problem of the form eq:main_problem, where $f_i(x) = \frac{1}{2} x^\top \mathbf{A}_i x - b_i^\top x$ for all $i \in [n],$ with $\mathbf{A}_i \in \textnormal{Sym}^{d}_{+}$ and $b_i \in \m for all $\gamma > 0.$ Moreover, when $\gamma \to 0$, then $\pi(\gamma) \times \frac{L_{\gamma} (1 +

Figures (4)

  • Figure 1: Empirical time complexities of FedExProx on a quadratic optimization task.
  • Figure 2: Empirical time complexities of FedExProx with partial client participation on a quadratic optimization task for $S \in \{1, 7, 14\}$ clients participating in each round.
  • Figure 3: Empirical time complexities of FedExProx on a classification task with smooth hinge loss.
  • Figure 4: Comparison of theoretical time complexity \ref{['eq:fedexprox_quad_time']} and empirical time complexity \ref{['eq:emperical_time_compl']}.

Theorems & Definitions (64)

  • Theorem 3.2
  • Remark 3.3
  • Theorem 4.1
  • Remark 4.2
  • Theorem 4.3
  • Example 4.4
  • Theorem 5.1
  • Theorem 5.2
  • Theorem 6.1
  • Remark 6.2
  • ...and 54 more