Heterogeneous Stochastic Momentum ADMM for Distributed Nonconvex Composite Optimization

Yangming Zhang; Yongyang Xiong; Jinming Xu; Keyou You; Yang Shi

Heterogeneous Stochastic Momentum ADMM for Distributed Nonconvex Composite Optimization

Yangming Zhang, Yongyang Xiong, Jinming Xu, Keyou You, Yang Shi

TL;DR

This paper proposes a novel Heterogeneous Stochastic Momentum Alternating Direction Method of Multipliers (HSM-ADMM), which completely decouples the algorithmic stability from global network properties, enabling robust and accelerated convergence across arbitrary connected topologies without requiring any global structural knowledge.

Abstract

This paper investigates the distributed stochastic nonconvex and nonsmooth composite optimization problem. Existing stochastic typically rely on uniform step size strictly bounded by global network parameters, such as the maximum node degree or spectral radius. This dependency creates a severe performance bottleneck, particularly in heterogeneous network topologies where the step size must be conservatively reduced to ensure stability. To overcome this limitation, we propose a novel Heterogeneous Stochastic Momentum Alternating Direction Method of Multipliers (HSM-ADMM). By integrating a recursive momentum estimator (STORM), HSM-ADMM achieves the optimal oracle complexity of $\mathcal{O}(ε^{-1.5})$ to reach an $ε$-stationary point, utilizing a strictly single-loop structure and an $\mathcal{O}(1)$ mini-batch size. The core innovation lies in a node-specific adaptive step-size strategy, which scales the proximal term according to local degree information. We theoretically demonstrate this design completely decouples the algorithmic stability from global network properties, enabling robust and accelerated convergence across arbitrary connected topologies without requiring any global structural knowledge. Furthermore, HSM-ADMM requires transmitting only a single primal variable per iteration, significantly reducing communication bandwidth compared to state-of-the-art gradient tracking algorithms. Extensive numerical experiments on distributed nonconvex learning tasks validate the superior efficiency of the proposed HSM-ADMM algorithm.

Heterogeneous Stochastic Momentum ADMM for Distributed Nonconvex Composite Optimization

TL;DR

Abstract

to reach an

-stationary point, utilizing a strictly single-loop structure and an

mini-batch size. The core innovation lies in a node-specific adaptive step-size strategy, which scales the proximal term according to local degree information. We theoretically demonstrate this design completely decouples the algorithmic stability from global network properties, enabling robust and accelerated convergence across arbitrary connected topologies without requiring any global structural knowledge. Furthermore, HSM-ADMM requires transmitting only a single primal variable per iteration, significantly reducing communication bandwidth compared to state-of-the-art gradient tracking algorithms. Extensive numerical experiments on distributed nonconvex learning tasks validate the superior efficiency of the proposed HSM-ADMM algorithm.

Paper Structure (21 sections, 5 theorems, 104 equations, 3 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 5 theorems, 104 equations, 3 figures, 1 table, 1 algorithm.

Introduction
Preliminaries and Problem Formulation
Graph Theory
Problem Formulation
Assumptions
Algorithm Development
Augmented Lagrangian Function
Stochastic Recursive Momentum
Primal-Dual Updates
Convergence Analysis
Optimality Condition and Stationarity
Key Lemmas
Lyapunov Function and Descent Property
Global Convergence Rate
Numerical Examples
...and 6 more sections

Key Result

Lemma 1

Suppose that Assumptions Ass: undirected and connected, Ass: mean-squared smoothness, Ass: f,g lower bounded, Ass: unbiasedness and variance boundedness hold. Let $\left ( \mathbf {x}^k,\mathbf {y}^k,\bm{\lambda}^k \right )$ be the sequence generated by Algorithm 1. For any constant $\theta > 0$, th where $\Delta \boldsymbol{\lambda}^{k+1} \triangleq \boldsymbol{\lambda}^{k+1} - \boldsymbol{\lambd

Figures (3)

Figure 1: Network topologies with 8 nodes
Figure 2: Performance comparison of distributed algorithms on a9a over a ring topology.
Figure 3: Performance comparison of distributed algorithms on MNIST over a random topology.

Theorems & Definitions (9)

Definition 1: $\epsilon$-stationary Point
Lemma 1
Remark 1
Lemma 2
Lemma 3
Theorem 1
Remark 2: Asymptotic Regularity
Theorem 2: Optimal Convergence Rate
Remark 3: Optimality and Topology Independence

Heterogeneous Stochastic Momentum ADMM for Distributed Nonconvex Composite Optimization

TL;DR

Abstract

Heterogeneous Stochastic Momentum ADMM for Distributed Nonconvex Composite Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (9)