Table of Contents
Fetching ...

A Unifying Primal-Dual Proximal Framework for Distributed Nonconvex Optimization

Zichong Ou, Jie Lu

TL;DR

This work proposes a Unifying Primal-Dual Proximal (UPP) framework that unifies a variety of existing first-order and second-order methods and derives two specialized realizations with different communication strategies, namely UPP-MC and UPP-SC.

Abstract

We consider distributed nonconvex optimization over an undirected network, where each node privately possesses its local objective and communicates exclusively with its neighboring nodes, striving to collectively achieve a common optimal solution. To handle the nonconvexity of the objective, we linearize the augmented Lagrangian function and introduce a time-varying proximal term. This approach leads to a Unifying Primal-Dual Proximal (UPP) framework that unifies a variety of existing first-order and second-order methods. Building on this framework, we further derive two specialized realizations with different communication strategies, namely UPP-MC and UPP-SC. We prove that both UPP-MC and UPP-SC achieve stationary solutions for nonconvex smooth problems at a sublinear rate. Furthermore, under the additional Polyak-Łojasiewics (P-Ł) condition, UPP-MC is linearly convergent to the global optimum. These convergence results provide new or improved guarantees for many existing methods that can be viewed as specializations of UPP-MC or UPP-SC. To further optimize the mixing process, we incorporate Chebyshev acceleration into UPP-SC, resulting in UPP-SC-OPT, which attains an optimal communication complexity bound. Extensive experiments across diverse network topologies demonstrate that our proposed algorithms outperform state-of-the-art methods in both convergence speed and communication efficiency.

A Unifying Primal-Dual Proximal Framework for Distributed Nonconvex Optimization

TL;DR

This work proposes a Unifying Primal-Dual Proximal (UPP) framework that unifies a variety of existing first-order and second-order methods and derives two specialized realizations with different communication strategies, namely UPP-MC and UPP-SC.

Abstract

We consider distributed nonconvex optimization over an undirected network, where each node privately possesses its local objective and communicates exclusively with its neighboring nodes, striving to collectively achieve a common optimal solution. To handle the nonconvexity of the objective, we linearize the augmented Lagrangian function and introduce a time-varying proximal term. This approach leads to a Unifying Primal-Dual Proximal (UPP) framework that unifies a variety of existing first-order and second-order methods. Building on this framework, we further derive two specialized realizations with different communication strategies, namely UPP-MC and UPP-SC. We prove that both UPP-MC and UPP-SC achieve stationary solutions for nonconvex smooth problems at a sublinear rate. Furthermore, under the additional Polyak-Łojasiewics (P-Ł) condition, UPP-MC is linearly convergent to the global optimum. These convergence results provide new or improved guarantees for many existing methods that can be viewed as specializations of UPP-MC or UPP-SC. To further optimize the mixing process, we incorporate Chebyshev acceleration into UPP-SC, resulting in UPP-SC-OPT, which attains an optimal communication complexity bound. Extensive experiments across diverse network topologies demonstrate that our proposed algorithms outperform state-of-the-art methods in both convergence speed and communication efficiency.
Paper Structure (31 sections, 11 theorems, 92 equations, 4 figures, 2 tables, 4 algorithms)

This paper contains 31 sections, 11 theorems, 92 equations, 4 figures, 2 tables, 4 algorithms.

Key Result

Proposition 1

Suppose Assumptions assumption smooth--assumption polynomial hold. Let $\{\mathbf{x}^k\}$ be the sequence generated by UPP-MC with proper parametersFor better readability, the explict expressions of the parameters and the constants throughout Section general convergence are given in the correspondin

Figures (4)

  • Figure 1: Convergence performance of related works on ring graph (${\gamma}=253.64$). In the legend, the number in the parentheses represents the number of communication rounds per iteration, and this notation applies equally to Fig. \ref{['simulation for grid graph']}, Fig. \ref{['simulation for geometric graph']} and Fig. \ref{['simulation for regular graph']}.
  • Figure 2: Convergence performance of related works on grid graph (${\gamma}=36.3$).
  • Figure 3: Convergence performance of related works on geometric graph (${\gamma}=3.98$) with $r=0.5$.
  • Figure 4: Convergence performance of related works on regular graph (${\gamma}=3.29$) with degree $=10$.

Theorems & Definitions (23)

  • Remark 1
  • Proposition 1
  • proof
  • Theorem 1
  • proof
  • Remark 2
  • Theorem 2
  • proof
  • Lemma 1
  • proof
  • ...and 13 more