On the convergence analysis of the decentralized projected gradient descent method

Woocheol Choi; Jimyeong Kim

On the convergence analysis of the decentralized projected gradient descent method

Woocheol Choi, Jimyeong Kim

TL;DR

This work analyzes the convergence of decentralized projected gradient descent (DPG) for constrained distributed optimization. It develops a novel sequential-estimate framework that leverages the contraction property of projection to bound deviations from the optimum, yielding an $O(\sqrt{\alpha})$-neighborhood for constant stepsizes and $O(t^{-p/2})$-rates for diminishing stepsizes, under standard smoothness and strong convexity assumptions. The authors further improve the bound to $O(\alpha)$ in the half-space domain $\Omega=\mathbb{R}^{d-1}\times\mathbb{R}_+$ with the optimum on the boundary, via a coordinate-splitting analysis. Numerical experiments on non-negative least squares and constrained logistic regression validate the theory and illustrate practical gains from a DPG+P-DIGing hybrid. Collectively, the results advance understanding of constrained decentralized optimization, guiding step-size choices and domain-aware convergence guarantees.

Abstract

In this work, we are concerned with the decentralized optimization problem: \begin{equation*} \min_{x \in Ω}~f(x) = \frac{1}{n} \sum_{i=1}^n f_i (x), \end{equation*} where $Ω\subset \mathbb{R}^d$ is a convex domain and each $f_i : Ω\rightarrow \mathbb{R}$ is a local cost function only known to agent $i$. A fundamental algorithm is the decentralized projected gradient method (DPG) given by \begin{equation*} x_i(t+1)=\mathcal{P}_Ω\Big[\sum^n_{j=1}w_{ij} x_j(t) -α(t)\nabla f_i(x_i(t))\Big] \end{equation*} where $\mathcal{P}_Ω$ is the projection operator to $Ω$ and $ \{w_{ij}\}_{1\leq i,j \leq n}$ are communication weight among the agents. While this method has been widely used in the literature, its convergence property has not been established so far, except for the special case $Ω= \mathbb{R}^n$. This work establishes new convergence estimates of DPG when the aggregate cost $f$ is strongly convex and each function $f_i$ is smooth. If the stepsize is given by constant $α(t) \equivα>0$ and suitably small, we prove that each $x_i (t)$ converges to an $O(\sqrtα)$-neighborhood of the optimal point. In addition, we further improve the convergence result by showing that the point $x_i (t)$ converges to an $O(α)$-neighborhood of the optimal point if the domain is given the half-space $\mathbb{R}^{d-1}\times \mathbb{R}_{+}$ for any dimension $d\in \mathbb{N}$. Also, we obtain new convergence results for decreasing stepsizes. Numerical experiments are provided to support the convergence results.

On the convergence analysis of the decentralized projected gradient descent method

TL;DR

-neighborhood for constant stepsizes and

-rates for diminishing stepsizes, under standard smoothness and strong convexity assumptions. The authors further improve the bound to

in the half-space domain

with the optimum on the boundary, via a coordinate-splitting analysis. Numerical experiments on non-negative least squares and constrained logistic regression validate the theory and illustrate practical gains from a DPG+P-DIGing hybrid. Collectively, the results advance understanding of constrained decentralized optimization, guiding step-size choices and domain-aware convergence guarantees.

Abstract

In this work, we are concerned with the decentralized optimization problem: \begin{equation*} \min_{x \in Ω}~f(x) = \frac{1}{n} \sum_{i=1}^n f_i (x), \end{equation*} where

is a convex domain and each

is a local cost function only known to agent

. A fundamental algorithm is the decentralized projected gradient method (DPG) given by \begin{equation*} x_i(t+1)=\mathcal{P}_Ω\Big[\sum^n_{j=1}w_{ij} x_j(t) -α(t)\nabla f_i(x_i(t))\Big] \end{equation*} where

is the projection operator to

and

are communication weight among the agents. While this method has been widely used in the literature, its convergence property has not been established so far, except for the special case

. This work establishes new convergence estimates of DPG when the aggregate cost

is strongly convex and each function

is smooth. If the stepsize is given by constant

and suitably small, we prove that each

converges to an

-neighborhood of the optimal point. In addition, we further improve the convergence result by showing that the point

converges to an

-neighborhood of the optimal point if the domain is given the half-space

for any dimension

. Also, we obtain new convergence results for decreasing stepsizes. Numerical experiments are provided to support the convergence results.

Paper Structure (28 sections, 22 theorems, 198 equations, 6 figures, 1 table)

This paper contains 28 sections, 22 theorems, 198 equations, 6 figures, 1 table.

Introduction
Related works and contributions of this work
Notations and organizations
Assumptions and main results
Consensus results
Convergence results: Constant stepsize
Convergence results: Diminishing stepsize
The main ideas of this work
The argument of [23]
The idea for the $O(\sqrt{\alpha})$-convergence
The idea for the $O({\alpha})$-convergence
Analysis for main results
Preparation: Sequential estimate
Consensus analysis (Proof of Theorem \ref{['thm2-5']})
Convergence analysis (Proof of Theorem \ref{['thm2-3']} and Theorem \ref{['thm2-6']}
...and 13 more sections

Key Result

Theorem 2.5

\newlabelthm-2-110 There exists a constant $R_s >0$ such that holds for all $t\geq 0$ if at least one of the following statements holds true:

Figures (6)

Figure 1: The consequences of Proposition \ref{['prop-3-1']} and Proposition \ref{['prop-3-2']}.
Figure 1: The graphs of $\log R(t)$ under various choices of constant stepsizes (left), diminishing stepsizes (right).
Figure 1: The overall flows of the proofs for Proposition \ref{['prop-3-1']} and Proposition \ref{['prop-3-2']}.
Figure 2: The grphs of $\log R(t)$ with P-DIGing, DPG using a constant step size, DPG+P-DIGing, and DPG with a diminishing step size.
Figure 3: The graph of $\log R(t)$ with P-DIGing, DPG using a constant step size, DPG+P-DIGing.
...and 1 more figures

Theorems & Definitions (50)

Theorem 2.5: Conditions for uniform bounedness
Theorem 2.7: Consensus
Theorem 2.8: Convergence for constant stepsize
Remark 2.9
Theorem 2.11
Theorem 2.12: Convergence for diminishing stepsize
Remark 2.13
Remark 2.14
Remark 2.15
Proposition 3.1
...and 40 more

On the convergence analysis of the decentralized projected gradient descent method

TL;DR

Abstract

On the convergence analysis of the decentralized projected gradient descent method

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (50)