Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

Hoang Huy Nguyen; Yan Li; Tuo Zhao

Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

Hoang Huy Nguyen, Yan Li, Tuo Zhao

TL;DR

This paper tackles decentralized constrained optimization under privacy and communication constraints by proposing an inexact primal-dual sliding (I-PDS) framework that is projection-free via a conditional gradient sliding (CGS) subroutine. It achieves topology-invariant gradient sampling while accommodating stochastic gradient oracles, and it offers a linear-optimization complexity of $O(1/\varepsilon^2)$ for both convex and strongly convex cases, independent of the graph's spectral gap. Theoretical results establish convergence guarantees with explicit gradient sampling and communication costs, and empirical experiments on logistic regression demonstrate reduced data oracle usage and robustness to noise across different graph topologies. The work contributes a practical, scalable approach for large-scale decentralized learning where data access and communication are costly, with potential impact on privacy-preserving distributed ML and networked optimization tasks.

Abstract

In modern decentralized applications, ensuring communication efficiency and privacy for the users are the key challenges. In order to train machine-learning models, the algorithm has to communicate to the data center and sample data for its gradient computation, thus exposing the data and increasing the communication cost. This gives rise to the need for a decentralized optimization algorithm that is communication-efficient and minimizes the number of gradient computations. To this end, we propose the primal-dual sliding with conditional gradient sliding framework, which is communication-efficient and achieves an $\varepsilon$-approximate solution with the optimal gradient complexity of $O(1/\sqrt{\varepsilon}+σ^2/{\varepsilon^2})$ and $O(\log(1/\varepsilon)+σ^2/\varepsilon)$ for the convex and strongly convex setting respectively and an LO (Linear Optimization) complexity of $O(1/\varepsilon^2)$ for both settings given a stochastic gradient oracle with variance $σ^2$. Compared with the prior work \cite{wai-fw-2017}, our framework relaxes the assumption of the optimal solution being a strict interior point of the feasible set and enjoys wider applicability for large-scale training using a stochastic gradient oracle. We also demonstrate the efficiency of our algorithms with various numerical experiments.

Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

TL;DR

for both convex and strongly convex cases, independent of the graph's spectral gap. Theoretical results establish convergence guarantees with explicit gradient sampling and communication costs, and empirical experiments on logistic regression demonstrate reduced data oracle usage and robustness to noise across different graph topologies. The work contributes a practical, scalable approach for large-scale decentralized learning where data access and communication are costly, with potential impact on privacy-preserving distributed ML and networked optimization tasks.

Abstract

-approximate solution with the optimal gradient complexity of

and

for the convex and strongly convex setting respectively and an LO (Linear Optimization) complexity of

for both settings given a stochastic gradient oracle with variance

. Compared with the prior work \cite{wai-fw-2017}, our framework relaxes the assumption of the optimal solution being a strict interior point of the feasible set and enjoys wider applicability for large-scale training using a stochastic gradient oracle. We also demonstrate the efficiency of our algorithms with various numerical experiments.

Paper Structure (16 sections, 7 theorems, 63 equations, 2 figures, 4 tables, 2 algorithms)

This paper contains 16 sections, 7 theorems, 63 equations, 2 figures, 4 tables, 2 algorithms.

Introduction
Related works
Our Contributions
Organization of the paper
Algorithms
Main results
Numerical experiments
Logistics regression experiments
The effects of graph topologies
Discussion
Discussions on the spectral gap of the communication network
Proof of key results
Linear oracle error bounds
Proof of Proposition \ref{['pro: LO-error-proposition-stochastic']}
Main results
...and 1 more sections

Key Result

Theorem 3.1

Denote $N$ as the pre-determined number of outer iterations, $\tau:=2\sqrt{\tilde{L}/\mu}$ and $\Delta:=\lceil 2\tau + 1\rceil$ if $\mu>0$, and $\Delta:=+\infty$ if $\mu=0$. Suppose that the Assumptions assumption: agent-smoothness, assumption: stochastic-gradient-assumptions hold, $V(\cdot,\cdot) = For all $k\ge \Delta+1$: And for all $k$ and $t$,

Figures (2)

Figure 1: The optimization model where the data has to be queried and the optimizer has to communicate with local workers. Grey arrows represent local data oracle access and red arrows represent the communication between the nodes.
Figure :

Theorems & Definitions (13)

Definition 1
Definition 2
Theorem 3.1
proof : Proof Sketch
Lemma 3.2
Proposition 1
Lemma B.1
proof
Proposition 2
proof
...and 3 more

Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

TL;DR

Abstract

Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (13)