Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization

Emre Sahinoglu; Shahin Shahrampour

Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization

Emre Sahinoglu, Shahin Shahrampour

TL;DR

This work studies finite-time decentralized optimization for nonsmooth nonconvex stochastic objectives by introducing Multi Epoch Decentralized Online Learning (ME-DOL). It leverages online-to-nonconvex conversion and randomized smoothing to derive ($\delta,\epsilon$)-stationarity guarantees across smooth, nonsmooth first-order, and nonsmooth zero-order settings, achieving a unified rate of $O(\delta^{-1}\epsilon^{-3})$ that matches centralized results. A key contribution is the explicit handling of network effects via the connectivity parameter $1-\rho$, yielding improved iteration complexity on well-connected networks. Empirical results on nonconvex penalized SVM problems demonstrate superior performance and validate the theoretical rates, including favorable behavior under different network topologies and zero- vs first-order oracles.

Abstract

We investigate the finite-time analysis of finding ($δ,ε$)-stationary points for nonsmooth nonconvex objectives in decentralized stochastic optimization. A set of agents aim at minimizing a global function using only their local information by interacting over a network. We present a novel algorithm, called Multi Epoch Decentralized Online Learning (ME-DOL), for which we establish the sample complexity in various settings. First, using a recently proposed online-to-nonconvex technique, we show that our algorithm recovers the optimal convergence rate of smooth nonconvex objectives. We then extend our analysis to the nonsmooth setting, building on properties of randomized smoothing and Goldstein-subdifferential sets. We establish the sample complexity of $O(δ^{-1}ε^{-3})$, which to the best of our knowledge is the first finite-time guarantee for decentralized nonsmooth nonconvex stochastic optimization in the first-order setting (without weak-convexity), matching its optimal centralized counterpart. We further prove the same rate for the zero-order oracle setting without using variance reduction.

Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization

TL;DR

)-stationarity guarantees across smooth, nonsmooth first-order, and nonsmooth zero-order settings, achieving a unified rate of

that matches centralized results. A key contribution is the explicit handling of network effects via the connectivity parameter

, yielding improved iteration complexity on well-connected networks. Empirical results on nonconvex penalized SVM problems demonstrate superior performance and validate the theoretical rates, including favorable behavior under different network topologies and zero- vs first-order oracles.

Abstract

We investigate the finite-time analysis of finding (

)-stationary points for nonsmooth nonconvex objectives in decentralized stochastic optimization. A set of agents aim at minimizing a global function using only their local information by interacting over a network. We present a novel algorithm, called Multi Epoch Decentralized Online Learning (ME-DOL), for which we establish the sample complexity in various settings. First, using a recently proposed online-to-nonconvex technique, we show that our algorithm recovers the optimal convergence rate of smooth nonconvex objectives. We then extend our analysis to the nonsmooth setting, building on properties of randomized smoothing and Goldstein-subdifferential sets. We establish the sample complexity of

, which to the best of our knowledge is the first finite-time guarantee for decentralized nonsmooth nonconvex stochastic optimization in the first-order setting (without weak-convexity), matching its optimal centralized counterpart. We further prove the same rate for the zero-order oracle setting without using variance reduction.

Paper Structure (24 sections, 9 theorems, 46 equations, 6 figures, 2 tables, 4 algorithms)

This paper contains 24 sections, 9 theorems, 46 equations, 6 figures, 2 tables, 4 algorithms.

Introduction
Contributions
Highlights of Technical Analysis
Literature Review
Problem Setting
Stationarity Metric in Nonsmooth Analysis
Properties of Randomized Smoothing
Assumptions
Algorithm and Main Technical Results
Challenges in the Analysis of Decentralized Algorithm
Smooth Analysis
Challenges in Nonsmooth Analysis
Nonsmooth Analysis with First-Order Oracle
Nonsmooth Analysis with Zero-Order Oracle
Numerical Experiments
...and 9 more sections

Key Result

Proposition 1

lin2022gradient Suppose that the function $f : \mathbb{R}^d \rightarrow \mathbb{R}$ is $L$-Lipschitz. Then, it holds that:

Figures (6)

Figure 1: Evaluation of the gradient norm in the first-order setting.
Figure 2: Evaluation of the gradient norm in the zero-order setting.
Figure 3: Ring graphs in the first-order setting.
Figure 4: Random graphs in the first-order setting.
Figure 5: Evaluation of the test accuracy of our algorithm and DGFM in the zero-order setting.
...and 1 more figures

Theorems & Definitions (23)

Remark 1
Definition 1
Definition 2
Definition 3
Proposition 1
Lemma 1
Proposition 2
Lemma 2
Remark 2
Remark 3
...and 13 more

Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization

TL;DR

Abstract

Online Optimization Perspective on First-Order and Zero-Order Decentralized Nonsmooth Nonconvex Stochastic Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (23)