Table of Contents
Fetching ...

Heterogeneous Stochastic Momentum ADMM for Distributed Nonconvex Composite Optimization

Yangming Zhang, Yongyang Xiong, Jinming Xu, Keyou You, Yang Shi

TL;DR

This paper proposes a novel Heterogeneous Stochastic Momentum Alternating Direction Method of Multipliers (HSM-ADMM), which completely decouples the algorithmic stability from global network properties, enabling robust and accelerated convergence across arbitrary connected topologies without requiring any global structural knowledge.

Abstract

This paper investigates the distributed stochastic nonconvex and nonsmooth composite optimization problem. Existing stochastic typically rely on uniform step size strictly bounded by global network parameters, such as the maximum node degree or spectral radius. This dependency creates a severe performance bottleneck, particularly in heterogeneous network topologies where the step size must be conservatively reduced to ensure stability. To overcome this limitation, we propose a novel Heterogeneous Stochastic Momentum Alternating Direction Method of Multipliers (HSM-ADMM). By integrating a recursive momentum estimator (STORM), HSM-ADMM achieves the optimal oracle complexity of $\mathcal{O}(ε^{-1.5})$ to reach an $ε$-stationary point, utilizing a strictly single-loop structure and an $\mathcal{O}(1)$ mini-batch size. The core innovation lies in a node-specific adaptive step-size strategy, which scales the proximal term according to local degree information. We theoretically demonstrate this design completely decouples the algorithmic stability from global network properties, enabling robust and accelerated convergence across arbitrary connected topologies without requiring any global structural knowledge. Furthermore, HSM-ADMM requires transmitting only a single primal variable per iteration, significantly reducing communication bandwidth compared to state-of-the-art gradient tracking algorithms. Extensive numerical experiments on distributed nonconvex learning tasks validate the superior efficiency of the proposed HSM-ADMM algorithm.

Heterogeneous Stochastic Momentum ADMM for Distributed Nonconvex Composite Optimization

TL;DR

This paper proposes a novel Heterogeneous Stochastic Momentum Alternating Direction Method of Multipliers (HSM-ADMM), which completely decouples the algorithmic stability from global network properties, enabling robust and accelerated convergence across arbitrary connected topologies without requiring any global structural knowledge.

Abstract

This paper investigates the distributed stochastic nonconvex and nonsmooth composite optimization problem. Existing stochastic typically rely on uniform step size strictly bounded by global network parameters, such as the maximum node degree or spectral radius. This dependency creates a severe performance bottleneck, particularly in heterogeneous network topologies where the step size must be conservatively reduced to ensure stability. To overcome this limitation, we propose a novel Heterogeneous Stochastic Momentum Alternating Direction Method of Multipliers (HSM-ADMM). By integrating a recursive momentum estimator (STORM), HSM-ADMM achieves the optimal oracle complexity of to reach an -stationary point, utilizing a strictly single-loop structure and an mini-batch size. The core innovation lies in a node-specific adaptive step-size strategy, which scales the proximal term according to local degree information. We theoretically demonstrate this design completely decouples the algorithmic stability from global network properties, enabling robust and accelerated convergence across arbitrary connected topologies without requiring any global structural knowledge. Furthermore, HSM-ADMM requires transmitting only a single primal variable per iteration, significantly reducing communication bandwidth compared to state-of-the-art gradient tracking algorithms. Extensive numerical experiments on distributed nonconvex learning tasks validate the superior efficiency of the proposed HSM-ADMM algorithm.
Paper Structure (21 sections, 5 theorems, 104 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 5 theorems, 104 equations, 3 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

Suppose that Assumptions Ass: undirected and connected, Ass: mean-squared smoothness, Ass: f,g lower bounded, Ass: unbiasedness and variance boundedness hold. Let $\left ( \mathbf {x}^k,\mathbf {y}^k,\bm{\lambda}^k \right )$ be the sequence generated by Algorithm 1. For any constant $\theta > 0$, th where $\Delta \boldsymbol{\lambda}^{k+1} \triangleq \boldsymbol{\lambda}^{k+1} - \boldsymbol{\lambd

Figures (3)

  • Figure 1: Network topologies with 8 nodes
  • Figure 2: Performance comparison of distributed algorithms on a9a over a ring topology.
  • Figure 3: Performance comparison of distributed algorithms on MNIST over a random topology.

Theorems & Definitions (9)

  • Definition 1: $\epsilon$-stationary Point
  • Lemma 1
  • Remark 1
  • Lemma 2
  • Lemma 3
  • Theorem 1
  • Remark 2: Asymptotic Regularity
  • Theorem 2: Optimal Convergence Rate
  • Remark 3: Optimality and Topology Independence