Table of Contents
Fetching ...

Stochastic smoothing accelerated gradient method for general constrained nonsmooth convex composite optimization

Ruyu Wang, Chao Zhang

TL;DR

SSAG addresses general constrained nonsmooth convex composite optimization by smoothing the nonsmooth term and applying stochastic updates within an accelerated gradient framework. It introduces flexible smoothing strategies (Nesterov smoothing, randomized smoothing, and inf-convolution smoothing) to produce differentiable surrogates whose gradients can be computed via expectations or convex combinations, avoiding the need for easily computable proximal operators. The authors prove that SSAG attains the best-known iteration complexity $O(1/ε)$ and SFO complexity $O(1/ε^2)$ with variable sample-size, and validate the method on distributionally robust optimization problems, including a DRO-moment formulation. The results indicate dimension-independent performance under suitable smoothing choices and demonstrate practical efficiency for large-scale nonsmooth problems in DRO contexts.

Abstract

We propose a novel stochastic smoothing accelerated gradient (SSAG) method for general constrained nonsmooth convex composite optimization, and analyze the convergence rates. The SSAG method allows various smoothing techniques, and can deal with the nonsmooth term that is not easy to compute its proximal term, or that does not own the linear max structure. To the best of our knowledge, it is the first time to develop a stochastic approximation type method that treats the maximization of finite but numerous nonsmooth convex functions as a stochastic function, which significantly improves the computational efficiency. We prove that the SSAG method can simultaneously achieve the best-known order ${\cal{O}}(\frac{1}ε)$ of iteration complexity, and the optimal order ${\cal{O}}(\frac{1}{ε^2})$ of $\cal{SFO}$ complexity, using variable sample-size. Numerical results on the application arising from the distributionally robust optimization demonstrate the effectiveness and efficiency of the proposed SSAG method.

Stochastic smoothing accelerated gradient method for general constrained nonsmooth convex composite optimization

TL;DR

SSAG addresses general constrained nonsmooth convex composite optimization by smoothing the nonsmooth term and applying stochastic updates within an accelerated gradient framework. It introduces flexible smoothing strategies (Nesterov smoothing, randomized smoothing, and inf-convolution smoothing) to produce differentiable surrogates whose gradients can be computed via expectations or convex combinations, avoiding the need for easily computable proximal operators. The authors prove that SSAG attains the best-known iteration complexity and SFO complexity with variable sample-size, and validate the method on distributionally robust optimization problems, including a DRO-moment formulation. The results indicate dimension-independent performance under suitable smoothing choices and demonstrate practical efficiency for large-scale nonsmooth problems in DRO contexts.

Abstract

We propose a novel stochastic smoothing accelerated gradient (SSAG) method for general constrained nonsmooth convex composite optimization, and analyze the convergence rates. The SSAG method allows various smoothing techniques, and can deal with the nonsmooth term that is not easy to compute its proximal term, or that does not own the linear max structure. To the best of our knowledge, it is the first time to develop a stochastic approximation type method that treats the maximization of finite but numerous nonsmooth convex functions as a stochastic function, which significantly improves the computational efficiency. We prove that the SSAG method can simultaneously achieve the best-known order of iteration complexity, and the optimal order of complexity, using variable sample-size. Numerical results on the application arising from the distributionally robust optimization demonstrate the effectiveness and efficiency of the proposed SSAG method.
Paper Structure (13 sections, 12 theorems, 135 equations, 1 table, 1 algorithm)

This paper contains 13 sections, 12 theorems, 135 equations, 1 table, 1 algorithm.

Key Result

Lemma 1

For $h$ in orip-2, assume that ${\mathbf{H}(x,\xi)}$ has a linear max structure in h for a.e. $\xi \in \Xi$, and there exists a constant $c_1>0$ such that $\|A_{\xi}\| \le c_1$ for a.e. $\xi\in \Xi$. Let $\tilde{h}_{\mu}(x) = \mathbb{E}_{\xi}\left[{\mathbf{\tilde{H}}_{\mu}(x,\xi)}\right]$ with $\mat

Theorems & Definitions (19)

  • Definition 1
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • Lemma 7
  • Lemma 8
  • Lemma 9
  • ...and 9 more