Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity

Wenzhi Gao; Qi Deng

Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity

Wenzhi Gao, Qi Deng

TL;DR

Based on new adaptive regularization strategies, it is shown that a wide class of stochastic algorithms, including the stochastic subgradient method, preserve the $\mathcal{O} ( 1 / \sqrt{K})$ convergence rate with constant failure rate.

Abstract

This paper considers stochastic weakly convex optimization without the standard Lipschitz continuity assumption. Based on new adaptive regularization (stepsize) strategies, we show that a wide class of stochastic algorithms, including the stochastic subgradient method, preserve the $\mathcal{O} ( 1 / \sqrt{K})$ convergence rate with constant failure rate. Our analyses rest on rather weak assumptions: the Lipschitz parameter can be either bounded by a general growth function of $\|x\|$ or locally estimated through independent random samples.

Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity

TL;DR

Based on new adaptive regularization strategies, it is shown that a wide class of stochastic algorithms, including the stochastic subgradient method, preserve the

convergence rate with constant failure rate.

Abstract

convergence rate with constant failure rate. Our analyses rest on rather weak assumptions: the Lipschitz parameter can be either bounded by a general growth function of

or locally estimated through independent random samples.

Paper Structure (53 sections, 23 theorems, 145 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 53 sections, 23 theorems, 145 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Contributions
Other related works
Preliminaries
Notations
Envelope smoothing
Model-based optimization
Assumptions.
Structure of the paper
Weakly convex optimization under standard Lipschitzness
Weakly convex optimization under generalized Lipschitzness
Stability of the iterations
Weakly convex optimization under unknown Lipschitzness
Reference Lipschitz continuity
Algorithm design and analysis
...and 38 more sections

Key Result

Lemma 3.1

Suppose that A1 to A5 as well as B1 holds, then given $\rho > \kappa + \tau, \gamma_k > \rho$, $\mathbb{E}_k [\psi_{1 / \rho} (x^{k + 1})] \leq \psi_{1 / \rho} (x^k) -\tfrac{\rho (\rho - \tau - \kappa)}{2 (\gamma_k - \kappa)} \| \hat{x}^k - x^k \|^2 + \tfrac{2 \rho L_f^2}{(\gamma_k - \rho) (\gamma_

Figures (5)

Figure 1: $f(x) = |e^x + e^{-x} - 3|$ exhibits exponential growth as $\|x\| \rightarrow + \infty$
Figure 2: Problem $r_1$. Left two: $(\kappa,p_{\text{fail}})=(10,0.2)$; Right two: $(\kappa,p_{\text{fail}})=(10,0.3)$. x-axis: parameter $\theta$; y-axis: number of iterations. SGD denotes vanilla SGD; SGD-G denotes SGD adaptive to known Lipschitzness; SGD-R denotes SGD adaptive to unknown Lipschitzness. The same applies to SPL.
Figure 3: Problem $r_2$. Left two: $(\kappa,p_{\text{fail}})=(1,0.2)$; Right two: $(\kappa,p_{\text{fail}})=(10,0.3)$. x-axis: parameter $\theta$; y-axis: number of iterations.
Figure 4: Problem $r_3$. Left two: $(\kappa,p_{\text{fail}})=(1,0.2)$; Right two: $(\kappa,p_{\text{fail}})=(1,0.3)$. x-axis: parameter $\theta$; y-axis: number of iterations.
Figure 5: Left two: Problem $r_1$, $(\kappa,p_{\text{fail}})=(1,0.3)$; Right two: Problem $r_2$, $(\kappa,p_{\text{fail}})=(1,0.3)$. x-axis: parameter $\theta$; y-axis: number of iterations.

Theorems & Definitions (36)

Remark 1
Remark 2
Lemma 3.1
Theorem 3.1
Example 4.1: Phase retrieval
Example 4.2: Subgradient method
Lemma 4.1
Theorem 4.1
Lemma 4.2: Informal
Lemma 4.3
...and 26 more

Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity

TL;DR

Abstract

Stochastic Weakly Convex Optimization Beyond Lipschitz Continuity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (36)