Table of Contents
Fetching ...

Stabilized Proximal-Point Methods for Federated Optimization

Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich

TL;DR

This work proposes a novel distributed algorithm S-DANE, which is inspired by the hybrid-projection proximal-point method and achieves the best-known communication complexity among all existing methods for distributed convex optimization while still enjoying good local computation efficiency.

Abstract

In developing efficient optimization algorithms, it is crucial to account for communication constraints -- a significant challenge in modern Federated Learning. The best-known communication complexity among non-accelerated algorithms is achieved by DANE, a distributed proximal-point algorithm that solves local subproblems at each iteration and that can exploit second-order similarity among individual functions. However, to achieve such communication efficiency, the algorithm requires solving local subproblems sufficiently accurately resulting in slightly sub-optimal local complexity. Inspired by the hybrid-projection proximal-point method, in this work, we propose a novel distributed algorithm S-DANE. Compared to DANE, this method uses an auxiliary sequence of prox-centers while maintaining the same deterministic communication complexity. Moreover, the accuracy condition for solving the subproblem is milder, leading to enhanced local computation efficiency. Furthermore, S-DANE supports partial client participation and arbitrary stochastic local solvers, making it attractive in practice. We further accelerate S-DANE and show that the resulting algorithm achieves the best-known communication complexity among all existing methods for distributed convex optimization while still enjoying good local computation efficiency as S-DANE. Finally, we propose adaptive variants of both methods using line search, obtaining the first provably efficient adaptive algorithms that could exploit local second-order similarity without the prior knowledge of any parameters.

Stabilized Proximal-Point Methods for Federated Optimization

TL;DR

This work proposes a novel distributed algorithm S-DANE, which is inspired by the hybrid-projection proximal-point method and achieves the best-known communication complexity among all existing methods for distributed convex optimization while still enjoying good local computation efficiency.

Abstract

In developing efficient optimization algorithms, it is crucial to account for communication constraints -- a significant challenge in modern Federated Learning. The best-known communication complexity among non-accelerated algorithms is achieved by DANE, a distributed proximal-point algorithm that solves local subproblems at each iteration and that can exploit second-order similarity among individual functions. However, to achieve such communication efficiency, the algorithm requires solving local subproblems sufficiently accurately resulting in slightly sub-optimal local complexity. Inspired by the hybrid-projection proximal-point method, in this work, we propose a novel distributed algorithm S-DANE. Compared to DANE, this method uses an auxiliary sequence of prox-centers while maintaining the same deterministic communication complexity. Moreover, the accuracy condition for solving the subproblem is milder, leading to enhanced local computation efficiency. Furthermore, S-DANE supports partial client participation and arbitrary stochastic local solvers, making it attractive in practice. We further accelerate S-DANE and show that the resulting algorithm achieves the best-known communication complexity among all existing methods for distributed convex optimization while still enjoying good local computation efficiency as S-DANE. Finally, we propose adaptive variants of both methods using line search, obtaining the first provably efficient adaptive algorithms that could exploit local second-order similarity without the prior knowledge of any parameters.
Paper Structure (37 sections, 24 theorems, 133 equations, 5 figures, 1 table, 5 algorithms)

This paper contains 37 sections, 24 theorems, 133 equations, 5 figures, 1 table, 5 algorithms.

Key Result

Theorem 1

Consider Alg:S-DANE with $s = n$. Let $f_i \colon \mathbb{R}^d \to \mathbb{R}$ be $\mu$-convex with $\mu \ge 0$ for any $i \in [n]$. Assume that $\{f_i\}_{i=1}^n$ have $\delta$-SOD. Let $\lambda = 2 \delta$ and suppose that, for any $r \geq 0$, we have Then, for any $R \ge 1$, it holds thatHere, for $\mu = 0$, the expression after the first inequality should be understood as the corresponding lim

Figures (5)

  • Figure 1: Comparison of S-DANE and Acc-S-DANE with DANE for solving a convex quadratic minimization problem. All three methods use GD as the local solver. S-DANE has improved local computation efficiency than DANE while Acc-S-DANE further improves the communication complexity. Finally, the adaptive variants can leverage local dissimilarities to achieve better performance. (The definitions of local smoothness and dissimilarity can be found in Section \ref{['sec:Exp-main']}.)
  • Figure 2: Comparisons of different algorithms for solving the polyhedron feasibility problem.
  • Figure 3: Comparison of S-DANE without control variate against other popular optimizers on multi-class classification tasks with CIFAR10 datasets using ResNet18.
  • Figure 4: Illustration of the impact of adaptive $\lambda$ used in (Acc-)S-DANE on the convergence of a regularized logistic regression problem on the ijcnn dataset libsvm.
  • Figure E.1: Comparison of S-DANE against DANE for solving a convex quadratic minimization problem with the same number of local steps.

Theorems & Definitions (54)

  • Definition 1: Second-order Dissimilarity
  • Definition 2: $\delta$-SOD svrpAccSVRSfedred
  • Theorem 1
  • Remark 2
  • Corollary 3
  • Remark 4
  • Definition 3: Bounded Gradient Variance mime
  • Definition 4: External Dissimilarity
  • Theorem 5
  • Theorem 6
  • ...and 44 more