Table of Contents
Fetching ...

Adaptive Decentralized Composite Optimization via Three-Operator Splitting

Xiaokai Chen, Ilya Kuruzov, Gesualdo Scutari

TL;DR

Under mere convexity, the proposed methods converge with a sublinear rate, and under strong convexity of the sum-function, and assuming the nonsmooth component is partly smooth, it is proved linear convergence.

Abstract

The paper studies decentralized optimization over networks, where agents minimize a sum of {\it locally} smooth (strongly) convex losses and plus a nonsmooth convex extended value term. We propose decentralized methods wherein agents {\it adaptively} adjust their stepsize via local backtracking procedures coupled with lightweight min-consensus protocols. Our design stems from a three-operator splitting factorization applied to an equivalent reformulation of the problem. The reformulation is endowed with a new BCV preconditioning metric (Bertsekas-O'Connor-Vandenberghe), which enables efficient decentralized implementation and local stepsize adjustments. We establish robust convergence guarantees. Under mere convexity, the proposed methods converge with a sublinear rate. Under strong convexity of the sum-function, and assuming the nonsmooth component is partly smooth, we further prove linear convergence. Numerical experiments corroborate the theory and highlight the effectiveness of the proposed adaptive stepsize strategy.

Adaptive Decentralized Composite Optimization via Three-Operator Splitting

TL;DR

Under mere convexity, the proposed methods converge with a sublinear rate, and under strong convexity of the sum-function, and assuming the nonsmooth component is partly smooth, it is proved linear convergence.

Abstract

The paper studies decentralized optimization over networks, where agents minimize a sum of {\it locally} smooth (strongly) convex losses and plus a nonsmooth convex extended value term. We propose decentralized methods wherein agents {\it adaptively} adjust their stepsize via local backtracking procedures coupled with lightweight min-consensus protocols. Our design stems from a three-operator splitting factorization applied to an equivalent reformulation of the problem. The reformulation is endowed with a new BCV preconditioning metric (Bertsekas-O'Connor-Vandenberghe), which enables efficient decentralized implementation and local stepsize adjustments. We establish robust convergence guarantees. Under mere convexity, the proposed methods converge with a sublinear rate. Under strong convexity of the sum-function, and assuming the nonsmooth component is partly smooth, we further prove linear convergence. Numerical experiments corroborate the theory and highlight the effectiveness of the proposed adaptive stepsize strategy.
Paper Structure (23 sections, 18 theorems, 92 equations, 3 figures, 3 algorithms)

This paper contains 23 sections, 18 theorems, 92 equations, 3 figures, 3 algorithms.

Key Result

Lemma 2.2

Suppose Assumption ass:function holds. For any given $({\mathscr A}^{k},{\mathscr X}^{k},{\mathscr S}^{k})$, $\alpha^k>0$, and fixed $({\mathscr X},{\mathscr S})\in \texttt{dom}({\@fontswitch{}{\mathcal{}} L})$, $({\mathscr A}^{k+1},{\mathscr X}^{k+1},{\mathscr S}^{k+1})$ given by eq:ATOS satisfies where $L^k$ is the local estimate of the curvature of $F$ at $\mathbf{X}^k$ along $\mathbf{A}^{k+1}

Figures (3)

  • Figure 1: Logistic regression with $\ell_1$-regularization: ${\frac{1}{m}\sum_{i=1}^m u(x_i)-u(x^*)}$ v.s. # iterations. Comparison of PG-EXTRA, SONATA, adaPDM, adaPDM2, global_DATOS and local_DATOS on Erdos-Renyi graphs with edge-probability $p=0.1$ (left); $p=0.5$ (middle); and $p=0.9$ (right).
  • Figure 2: ML estimate of the covariance matrix: ${\frac{1}{m}\sum_{i=1}^m u(x_i)-u(x^*)}$ v.s. # iterations. Comparison of PG-EXTRA, SONATA, adaPDM, adaPDM2, global_DATOS and local_DATOS on Erdos-Renyi graphs with edge-probability: $p=0.1$ (left); $p=0.5$ (middle); and $p=0.9$ (right).
  • Figure 3: Linear regression with elastic net regularization: ${\|\mathbf{X}^k-\mathbf{X}^*\|^2}$ v.s. # iterations. Comparison of PG-EXTRA, SONATA, adaPDM, adaPDM2, global_DATOS and local_DATOS on Erdos-Renyi graphs with edge-probability: $p=0.1$ (left); $p=0.5$ (middle); and $p=0.9$ (right).

Theorems & Definitions (32)

  • Definition 2.1: gossip matrices
  • Lemma 2.2
  • Proof 1
  • Lemma 2.3
  • Lemma 2.4
  • Lemma 2.5: ryu2022large
  • Proof 2
  • Lemma 3.1
  • Proof 3
  • Theorem 3.2
  • ...and 22 more