Stochastic smoothing accelerated gradient method for general constrained nonsmooth convex composite optimization

Ruyu Wang; Chao Zhang

Stochastic smoothing accelerated gradient method for general constrained nonsmooth convex composite optimization

Ruyu Wang, Chao Zhang

TL;DR

SSAG addresses general constrained nonsmooth convex composite optimization by smoothing the nonsmooth term and applying stochastic updates within an accelerated gradient framework. It introduces flexible smoothing strategies (Nesterov smoothing, randomized smoothing, and inf-convolution smoothing) to produce differentiable surrogates whose gradients can be computed via expectations or convex combinations, avoiding the need for easily computable proximal operators. The authors prove that SSAG attains the best-known iteration complexity $O(1/ε)$ and SFO complexity $O(1/ε^2)$ with variable sample-size, and validate the method on distributionally robust optimization problems, including a DRO-moment formulation. The results indicate dimension-independent performance under suitable smoothing choices and demonstrate practical efficiency for large-scale nonsmooth problems in DRO contexts.

Abstract

We propose a novel stochastic smoothing accelerated gradient (SSAG) method for general constrained nonsmooth convex composite optimization, and analyze the convergence rates. The SSAG method allows various smoothing techniques, and can deal with the nonsmooth term that is not easy to compute its proximal term, or that does not own the linear max structure. To the best of our knowledge, it is the first time to develop a stochastic approximation type method that treats the maximization of finite but numerous nonsmooth convex functions as a stochastic function, which significantly improves the computational efficiency. We prove that the SSAG method can simultaneously achieve the best-known order ${\cal{O}}(\frac{1}ε)$ of iteration complexity, and the optimal order ${\cal{O}}(\frac{1}{ε^2})$ of $\cal{SFO}$ complexity, using variable sample-size. Numerical results on the application arising from the distributionally robust optimization demonstrate the effectiveness and efficiency of the proposed SSAG method.

Stochastic smoothing accelerated gradient method for general constrained nonsmooth convex composite optimization

TL;DR

and SFO complexity

with variable sample-size, and validate the method on distributionally robust optimization problems, including a DRO-moment formulation. The results indicate dimension-independent performance under suitable smoothing choices and demonstrate practical efficiency for large-scale nonsmooth problems in DRO contexts.

Abstract

of iteration complexity, and the optimal order

complexity, using variable sample-size. Numerical results on the application arising from the distributionally robust optimization demonstrate the effectiveness and efficiency of the proposed SSAG method.

Paper Structure (13 sections, 12 theorems, 135 equations, 1 table, 1 algorithm)

This paper contains 13 sections, 12 theorems, 135 equations, 1 table, 1 algorithm.

Introduction
Smoothing functions
Smoothing approximations
Stochastic gradients and fundamental assumptions
Smoothing properties
SSAG method
Applications and numerical results
DRO-moment problem
Conclusions
Proof of the smoothing functions
Nesterov's smoothing
Randomized smoothing
Inf-conv smoothing

Key Result

Lemma 1

For $h$ in orip-2, assume that ${\mathbf{H}(x,\xi)}$ has a linear max structure in h for a.e. $\xi \in \Xi$, and there exists a constant $c_1>0$ such that $\|A_{\xi}\| \le c_1$ for a.e. $\xi\in \Xi$. Let $\tilde{h}_{\mu}(x) = \mathbb{E}_{\xi}\left[{\mathbf{\tilde{H}}_{\mu}(x,\xi)}\right]$ with $\mat

Theorems & Definitions (19)

Definition 1
Lemma 1
Lemma 2
Lemma 3
Lemma 4
Lemma 5
Lemma 6
Lemma 7
Lemma 8
Lemma 9
...and 9 more

Stochastic smoothing accelerated gradient method for general constrained nonsmooth convex composite optimization

TL;DR

Abstract

Stochastic smoothing accelerated gradient method for general constrained nonsmooth convex composite optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (19)