On the convergence analysis of the decentralized projected gradient descent method
Woocheol Choi, Jimyeong Kim
TL;DR
This work analyzes the convergence of decentralized projected gradient descent (DPG) for constrained distributed optimization. It develops a novel sequential-estimate framework that leverages the contraction property of projection to bound deviations from the optimum, yielding an $O(\sqrt{\alpha})$-neighborhood for constant stepsizes and $O(t^{-p/2})$-rates for diminishing stepsizes, under standard smoothness and strong convexity assumptions. The authors further improve the bound to $O(\alpha)$ in the half-space domain $\Omega=\mathbb{R}^{d-1}\times\mathbb{R}_+$ with the optimum on the boundary, via a coordinate-splitting analysis. Numerical experiments on non-negative least squares and constrained logistic regression validate the theory and illustrate practical gains from a DPG+P-DIGing hybrid. Collectively, the results advance understanding of constrained decentralized optimization, guiding step-size choices and domain-aware convergence guarantees.
Abstract
In this work, we are concerned with the decentralized optimization problem: \begin{equation*} \min_{x \in Ω}~f(x) = \frac{1}{n} \sum_{i=1}^n f_i (x), \end{equation*} where $Ω\subset \mathbb{R}^d$ is a convex domain and each $f_i : Ω\rightarrow \mathbb{R}$ is a local cost function only known to agent $i$. A fundamental algorithm is the decentralized projected gradient method (DPG) given by \begin{equation*} x_i(t+1)=\mathcal{P}_Ω\Big[\sum^n_{j=1}w_{ij} x_j(t) -α(t)\nabla f_i(x_i(t))\Big] \end{equation*} where $\mathcal{P}_Ω$ is the projection operator to $Ω$ and $ \{w_{ij}\}_{1\leq i,j \leq n}$ are communication weight among the agents. While this method has been widely used in the literature, its convergence property has not been established so far, except for the special case $Ω= \mathbb{R}^n$. This work establishes new convergence estimates of DPG when the aggregate cost $f$ is strongly convex and each function $f_i$ is smooth. If the stepsize is given by constant $α(t) \equivα>0$ and suitably small, we prove that each $x_i (t)$ converges to an $O(\sqrtα)$-neighborhood of the optimal point. In addition, we further improve the convergence result by showing that the point $x_i (t)$ converges to an $O(α)$-neighborhood of the optimal point if the domain is given the half-space $\mathbb{R}^{d-1}\times \mathbb{R}_{+}$ for any dimension $d\in \mathbb{N}$. Also, we obtain new convergence results for decreasing stepsizes. Numerical experiments are provided to support the convergence results.
