Stochastic First-Order Methods with Non-smooth and Non-Euclidean Proximal Terms for Nonconvex High-Dimensional Stochastic Optimization
Yue Xie, Jiawen Bi, Hongcheng Liu
TL;DR
The paper tackles high-dimensional stochastic nonconvex optimization by introducing dimension-insensitive stochastic first-order methods (DISFOMs) that permit non-Euclidean and non-smooth proximal terms. It develops two closed-form proximal options and analyzes gradient estimators (minibatch and variance reduction), proving that DISFOMs achieve $O((\log d)/\epsilon^4)$ and $O((\log d)^{2/3}/\epsilon^{10/3})$ sample complexities respectively for obtaining an $\epsilon$-stationary point, with the latter representing a leading dependence on the dimension $d$. The theoretical results are complemented by numerical experiments showing dimension-stability and competitiveness against traditional proximal SGD/SVRG and SMD methods. Overall, the approach provides a scalable pathway for nonconvex stochastic optimization in settings where dimension and non-smooth proximal terms would otherwise hinder performance.
Abstract
When the nonconvex problem is complicated by stochasticity, the sample complexity of stochastic first-order methods may depend linearly on the problem dimension, which is undesirable for large-scale problems. In this work, we propose dimension-insensitive stochastic first-order methods (DISFOMs) to address nonconvex optimization with expected-valued objective function. Our algorithms allow for non-Euclidean and non-smooth distance functions as the proximal terms. Under mild assumptions, we show that DISFOM using minibatches to estimate the gradient enjoys sample complexity of $ \mathcal{O} ( (\log d) / ε^4 ) $ to obtain an $ε$-stationary point. Furthermore, we prove that DISFOM employing variance reduction can sharpen this bound to $\mathcal{O} ( (\log d)^{2/3}/ε^{10/3} )$, which perhaps leads to the best-known sample complexity result in terms of $d$. We provide two choices of the non-smooth distance functions, both of which allow for closed-form solutions to the proximal step. Numerical experiments are conducted to illustrate the dimension insensitive property of the proposed frameworks.
