Stabilized Proximal-Point Methods for Federated Optimization

Xiaowen Jiang; Anton Rodomanov; Sebastian U. Stich

Stabilized Proximal-Point Methods for Federated Optimization

Xiaowen Jiang, Anton Rodomanov, Sebastian U. Stich

TL;DR

This work proposes a novel distributed algorithm S-DANE, which is inspired by the hybrid-projection proximal-point method and achieves the best-known communication complexity among all existing methods for distributed convex optimization while still enjoying good local computation efficiency.

Abstract

In developing efficient optimization algorithms, it is crucial to account for communication constraints -- a significant challenge in modern Federated Learning. The best-known communication complexity among non-accelerated algorithms is achieved by DANE, a distributed proximal-point algorithm that solves local subproblems at each iteration and that can exploit second-order similarity among individual functions. However, to achieve such communication efficiency, the algorithm requires solving local subproblems sufficiently accurately resulting in slightly sub-optimal local complexity. Inspired by the hybrid-projection proximal-point method, in this work, we propose a novel distributed algorithm S-DANE. Compared to DANE, this method uses an auxiliary sequence of prox-centers while maintaining the same deterministic communication complexity. Moreover, the accuracy condition for solving the subproblem is milder, leading to enhanced local computation efficiency. Furthermore, S-DANE supports partial client participation and arbitrary stochastic local solvers, making it attractive in practice. We further accelerate S-DANE and show that the resulting algorithm achieves the best-known communication complexity among all existing methods for distributed convex optimization while still enjoying good local computation efficiency as S-DANE. Finally, we propose adaptive variants of both methods using line search, obtaining the first provably efficient adaptive algorithms that could exploit local second-order similarity without the prior knowledge of any parameters.

Stabilized Proximal-Point Methods for Federated Optimization

TL;DR

Abstract

Paper Structure (37 sections, 24 theorems, 133 equations, 5 figures, 1 table, 5 algorithms)

This paper contains 37 sections, 24 theorems, 133 equations, 5 figures, 1 table, 5 algorithms.

Introduction
Contributions.
Related Work.
Problem Setup and Background
Proximal-Point Methods on Single Machine
Proximal-Point Method.
Stabilized Proximal-Point Method.
Distributed Proximal-Point Methods
Stabilized DANE
Full Client Participation.
Partial Client Participation.
Accelerated S-DANE
Dynamic Estimation of Similarity Constant by Line Search
Numerical Experiments
Conclusion
...and 22 more sections

Key Result

Theorem 1

Consider Alg:S-DANE with $s = n$. Let $f_i \colon \mathbb{R}^d \to \mathbb{R}$ be $\mu$-convex with $\mu \ge 0$ for any $i \in [n]$. Assume that $\{f_i\}_{i=1}^n$ have $\delta$-SOD. Let $\lambda = 2 \delta$ and suppose that, for any $r \geq 0$, we have Then, for any $R \ge 1$, it holds thatHere, for $\mu = 0$, the expression after the first inequality should be understood as the corresponding lim

Figures (5)

Figure 1: Comparison of S-DANE and Acc-S-DANE with DANE for solving a convex quadratic minimization problem. All three methods use GD as the local solver. S-DANE has improved local computation efficiency than DANE while Acc-S-DANE further improves the communication complexity. Finally, the adaptive variants can leverage local dissimilarities to achieve better performance. (The definitions of local smoothness and dissimilarity can be found in Section \ref{['sec:Exp-main']}.)
Figure 2: Comparisons of different algorithms for solving the polyhedron feasibility problem.
Figure 3: Comparison of S-DANE without control variate against other popular optimizers on multi-class classification tasks with CIFAR10 datasets using ResNet18.
Figure 4: Illustration of the impact of adaptive $\lambda$ used in (Acc-)S-DANE on the convergence of a regularized logistic regression problem on the ijcnn dataset libsvm.
Figure E.1: Comparison of S-DANE against DANE for solving a convex quadratic minimization problem with the same number of local steps.

Theorems & Definitions (54)

Definition 1: Second-order Dissimilarity
Definition 2: $\delta$-SOD svrpAccSVRSfedred
Theorem 1
Remark 2
Corollary 3
Remark 4
Definition 3: Bounded Gradient Variance mime
Definition 4: External Dissimilarity
Theorem 5
Theorem 6
...and 44 more

Stabilized Proximal-Point Methods for Federated Optimization

TL;DR

Abstract

Stabilized Proximal-Point Methods for Federated Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (54)