Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps

Junchi Yang; Murat Yildirim; Qiu Feng

Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps

Junchi Yang, Murat Yildirim, Qiu Feng

TL;DR

This paper tackles distributed optimization across multiple agents with heterogeneous data by formulating a primal–dual framework in which local primal updates occur without inter-agent communication and a coordinated dual ascent step governs global agreement. By applying a GA-MSGD approach to the Lagrangian and coupling it with Catalyst acceleration, the authors achieve near-optimal communication complexities across strongly convex, convex, and nonconvex settings without requiring large minibatches, in both centralized and decentralized networks. A key theoretical insight is that the dual function becomes strongly concave when the coupling matrix has full rank (or within the span of the network’s $U=\sqrt{I-W}$ in the decentralized case), which enables linear convergence in the outer loop and rapid reduction of communication rounds. The framework unifies several existing methods under a minimax perspective and demonstrates improvements over prior rates for LED, Scaffnew/ProxSkip, and stochastic gradient tracking, offering practical appeal for scalable distributed learning and optimization tasks.

Abstract

In distributed machine learning, efficient training across multiple agents with different data distributions poses significant challenges. Even with a centralized coordinator, current algorithms that achieve optimal communication complexity typically require either large minibatches or compromise on gradient complexity. In this work, we tackle both centralized and decentralized settings across strongly convex, convex, and nonconvex objectives. We first demonstrate that a basic primal-dual method, (Accelerated) Gradient Ascent Multiple Stochastic Gradient Descent (GA-MSGD), applied to the Lagrangian of distributed optimization inherently incorporates local updates, because the inner loops of running Stochastic Gradient Descent on the primal variable require no inter-agent communication. Notably, for strongly convex objectives, (Accelerated) GA-MSGD achieves linear convergence in communication rounds despite the Lagrangian being only linear in the dual variables. This is due to a structural property where the dual variable is confined to the span of the coupling matrix, rendering the dual problem strongly concave. When integrated with the Catalyst framework, our approach achieves nearly optimal communication complexity across various settings without the need for minibatches.

Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps

TL;DR

in the decentralized case), which enables linear convergence in the outer loop and rapid reduction of communication rounds. The framework unifies several existing methods under a minimax perspective and demonstrates improvements over prior rates for LED, Scaffnew/ProxSkip, and stochastic gradient tracking, offering practical appeal for scalable distributed learning and optimization tasks.

Abstract

Paper Structure (42 sections, 21 theorems, 35 equations, 3 tables, 6 algorithms)

This paper contains 42 sections, 21 theorems, 35 equations, 3 tables, 6 algorithms.

Introduction
Our Contributions.
Related Work
Strongly convex:
Convex:
Nonconvex:
(Primal)-Dual Algorithms:
Preliminary
Notation and Acronym
Centralized Optimization
A Primal-Dual Algorithm
Convergence Analysis for Centralized GA-MSGD
Catalyst Acceleration for GA-MSGD
Catalyst Acceleration for SCAFFOLD
Decentralized Optimizaiton
...and 27 more sections

Key Result

Theorem 1

Assume that $F$ is $L$-smooth. Define $F^* = \min_x F(x)$ and $\Delta = F(x^0) - F^*$. The following convergence guarantees for Algorithm alg:APPA hold with the specified subproblem accuracy $\{ \epsilon^s\}_s$ under different convexity assumptions for $F$.

Theorems & Definitions (26)

Theorem 1
Theorem 2: Outer-loop Complexity
Proposition 1
Lemma 1: Inner-loop Complexity
Corollary 1: Total Complexity
Corollary 2
Remark 1
Remark 2
Theorem 3: karimireddy2020scaffold
Corollary 3
...and 16 more

Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps

TL;DR

Abstract

Accelerating Distributed Optimization: A Primal-Dual Perspective on Local Steps

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (26)