ProxSTORM -- A Stochastic Trust-Region Algorithm for Nonsmooth Optimization
Robert J. Baraldi, Aurya Javeed, Drew P. Kouri, Katya Scheinberg
TL;DR
ProxSTORM addresses the problem of minimizing a composite objective $f(x)+\varphi(x)$ where $f$ is stochastic and Lipschitz-smooth while $\varphi$ is convex and possibly nonsmooth.It extends the STORM stochastic trust-region framework by incorporating the proximal term via the proximal gradient $h(X_k)$ and by handling stochastic inexactness in both the model and reductions.The paper proves global convergence with a limit-type result and derives an $\mathcal{O}(\epsilon^{-2})$ expected complexity bound, under assumptions on model accuracy, proximal-reduction accuracy, and controlled decrease.Numerical experiments on $\ell^1$-regularized neural network training and a topology-optimization problem demonstrate the method's numerical viability and robustness to stochastic inexactness under convex constraints.Overall, ProxSTORM provides a practical, theoretically backed approach for nonsmooth stochastic composite optimization with provable convergence guarantees and competitive empirical performance.
Abstract
We develop a stochastic trust-region algorithm for minimizing the sum of a possibly nonconvex Lipschitz-smooth function that can only be evaluated stochastically and a nonsmooth, deterministic, convex function. This algorithm, which we call ProxSTORM, generalizes STORM [1, 2] -- a stochastic trust-region algorithm for the unconstrained optimization of smooth functions -- and the inexact deterministic proximal trust-region algorithm in [3]. We generalize and, in some cases, simplify problem assumptions so that they reduce to more succinct version of assumptions on STORM when the convex term is zero. Our analysis follows the STORM framework by employing martingales, but again simplifies certain steps and proving global convergence and an expected complexity bound in the more general setting of a possibly nonsmooth term. To demonstrate that the method is numerically viable, we apply the algorithm to $\ell^1$-regularized neural network training and also to topology optimization.
