Table of Contents
Fetching ...

Exact Decentralized Optimization via Explicit $\ell_1$ Consensus Penalties

Hong Wang

TL;DR

This work tackles decentralized consensus optimization under memory and communication constraints by reformulating the problem with an explicit $\ell_1$ penalty that enables exact consensus once a computable threshold is exceeded. It introduces a modular two-layer penalty-continuation framework, where an outer loop governs the penalty and an inner plug-and-play saddle-point solver (instantiated as DP$^2$G, a proximal-gradient variant) achieves consensus with fixed steps and minimal per-node memory. The authors prove global convergence via the Kurdyka-Łojasiewicz framework, vanishing disagreement, and linear rates under strong convexity, and validate the approach on distributed ridge, logistic, and elastic-net tasks across networks, showing favorable speed and communication efficiency relative to DGD-type methods and competitiveness with gradient-tracking. This penalty-based exactness approach offers a tracker-free yet accurate decentralized optimization paradigm with practical applicability to composite objectives and constrained settings.

Abstract

Consensus optimization enables autonomous agents to solve joint tasks through peer-to-peer exchanges alone. Classical decentralized gradient descent is appealing for its minimal state but fails to achieve exact consensus with fixed stepsizes unless additional trackers or dual variables are introduced. We revisit penalty methods and introduce a decentralized two-layer framework that couples an outer penalty-continuation loop with an inner plug-and-play saddle-point solver. Any primal-dual routine that satisfies simple stationarity and communication conditions can be used; when instantiated with a proximal-gradient solver, the framework yields the DP$^2$G algorithm, which reaches exact consensus with constant stepsizes, stores only one dual residual per agent, and requires exactly two short message exchanges per inner iteration. An explicit $\ell_1$ penalty enforces agreement and, once above a computable threshold, makes the penalized and constrained problems equivalent. Leveraging the Kurdyka-Łojasiewicz property, we prove global convergence, vanishing disagreement, and linear rates for strongly convex objectives under any admissible inner solver. Experiments on distributed least squares, logistic regression, and elastic-net tasks across various networks demonstrate that DP$^2$G outperforms DGD-type methods in both convergence speed and communication efficiency, is competitive with gradient-tracking approaches while using less memory, and naturally accommodates composite objectives.

Exact Decentralized Optimization via Explicit $\ell_1$ Consensus Penalties

TL;DR

This work tackles decentralized consensus optimization under memory and communication constraints by reformulating the problem with an explicit penalty that enables exact consensus once a computable threshold is exceeded. It introduces a modular two-layer penalty-continuation framework, where an outer loop governs the penalty and an inner plug-and-play saddle-point solver (instantiated as DPG, a proximal-gradient variant) achieves consensus with fixed steps and minimal per-node memory. The authors prove global convergence via the Kurdyka-Łojasiewicz framework, vanishing disagreement, and linear rates under strong convexity, and validate the approach on distributed ridge, logistic, and elastic-net tasks across networks, showing favorable speed and communication efficiency relative to DGD-type methods and competitiveness with gradient-tracking. This penalty-based exactness approach offers a tracker-free yet accurate decentralized optimization paradigm with practical applicability to composite objectives and constrained settings.

Abstract

Consensus optimization enables autonomous agents to solve joint tasks through peer-to-peer exchanges alone. Classical decentralized gradient descent is appealing for its minimal state but fails to achieve exact consensus with fixed stepsizes unless additional trackers or dual variables are introduced. We revisit penalty methods and introduce a decentralized two-layer framework that couples an outer penalty-continuation loop with an inner plug-and-play saddle-point solver. Any primal-dual routine that satisfies simple stationarity and communication conditions can be used; when instantiated with a proximal-gradient solver, the framework yields the DPG algorithm, which reaches exact consensus with constant stepsizes, stores only one dual residual per agent, and requires exactly two short message exchanges per inner iteration. An explicit penalty enforces agreement and, once above a computable threshold, makes the penalized and constrained problems equivalent. Leveraging the Kurdyka-Łojasiewicz property, we prove global convergence, vanishing disagreement, and linear rates for strongly convex objectives under any admissible inner solver. Experiments on distributed least squares, logistic regression, and elastic-net tasks across various networks demonstrate that DPG outperforms DGD-type methods in both convergence speed and communication efficiency, is competitive with gradient-tracking approaches while using less memory, and naturally accommodates composite objectives.

Paper Structure

This paper contains 41 sections, 11 theorems, 33 equations, 8 figures, 2 tables, 1 algorithm.

Key Result

Lemma 2.1

Let $g_i(\mathbf{x}) \in (\partial_{\mathbf{x}} \|Z\mathbf{x}\|_1)_i$ be a subgradient of the penalty with respect to agent $i$. Then where the $\operatorname{sign}(\cdot)$ operator acts component-wise and produces subgradients in $[-1,1]$ when the corresponding residual coordinate is zero.

Figures (8)

  • Figure 1: Network topologies used in the experiments: ring (top left), $4\times5$ grid (top right), and random geometric graph (bottom).
  • Figure 2: Ridge regression on the ring: objective residual (top) and consensus violation (bottom). Enlarged panels reveal the linear tail achieved by DP$^2$G while other one-state baselines stall.
  • Figure 3: Ridge regression on the $4\times5$ grid: objective residual (top) and consensus violation (bottom). DP$^2$G tracks EXTRA closely while using only one auxiliary vector per agent.
  • Figure 4: Ridge regression on the random geometric graph: objective residual (top) and consensus violation (bottom). Improved connectivity benefits every method, and DP$^2$G retains the best communication-versus-accuracy trade-off among one-state schemes.
  • Figure 5: Logistic regression on the ring: objective residual (top) and optimality residual (bottom). Gradient tracking clearly accelerates, yet DP$^2$G maintains steady decay with a single dual vector per node.
  • ...and 3 more figures

Theorems & Definitions (22)

  • Lemma 2.1: Local structure of the penalty subgradient
  • proof
  • Theorem 2.2: Exactness of $\ell_1$ penalty
  • proof
  • Lemma 3.1: Stationarity residual
  • proof
  • Lemma 4.1: Critical points of $\Psi_\rho$
  • proof
  • Lemma 4.2: One-step Lyapunov descent
  • proof
  • ...and 12 more