Table of Contents
Fetching ...

ProxSTORM -- A Stochastic Trust-Region Algorithm for Nonsmooth Optimization

Robert J. Baraldi, Aurya Javeed, Drew P. Kouri, Katya Scheinberg

TL;DR

ProxSTORM addresses the problem of minimizing a composite objective $f(x)+\varphi(x)$ where $f$ is stochastic and Lipschitz-smooth while $\varphi$ is convex and possibly nonsmooth.It extends the STORM stochastic trust-region framework by incorporating the proximal term via the proximal gradient $h(X_k)$ and by handling stochastic inexactness in both the model and reductions.The paper proves global convergence with a limit-type result and derives an $\mathcal{O}(\epsilon^{-2})$ expected complexity bound, under assumptions on model accuracy, proximal-reduction accuracy, and controlled decrease.Numerical experiments on $\ell^1$-regularized neural network training and a topology-optimization problem demonstrate the method's numerical viability and robustness to stochastic inexactness under convex constraints.Overall, ProxSTORM provides a practical, theoretically backed approach for nonsmooth stochastic composite optimization with provable convergence guarantees and competitive empirical performance.

Abstract

We develop a stochastic trust-region algorithm for minimizing the sum of a possibly nonconvex Lipschitz-smooth function that can only be evaluated stochastically and a nonsmooth, deterministic, convex function. This algorithm, which we call ProxSTORM, generalizes STORM [1, 2] -- a stochastic trust-region algorithm for the unconstrained optimization of smooth functions -- and the inexact deterministic proximal trust-region algorithm in [3]. We generalize and, in some cases, simplify problem assumptions so that they reduce to more succinct version of assumptions on STORM when the convex term is zero. Our analysis follows the STORM framework by employing martingales, but again simplifies certain steps and proving global convergence and an expected complexity bound in the more general setting of a possibly nonsmooth term. To demonstrate that the method is numerically viable, we apply the algorithm to $\ell^1$-regularized neural network training and also to topology optimization.

ProxSTORM -- A Stochastic Trust-Region Algorithm for Nonsmooth Optimization

TL;DR

ProxSTORM addresses the problem of minimizing a composite objective $f(x)+\varphi(x)$ where $f$ is stochastic and Lipschitz-smooth while $\varphi$ is convex and possibly nonsmooth.It extends the STORM stochastic trust-region framework by incorporating the proximal term via the proximal gradient $h(X_k)$ and by handling stochastic inexactness in both the model and reductions.The paper proves global convergence with a limit-type result and derives an $\mathcal{O}(\epsilon^{-2})$ expected complexity bound, under assumptions on model accuracy, proximal-reduction accuracy, and controlled decrease.Numerical experiments on $\ell^1$-regularized neural network training and a topology-optimization problem demonstrate the method's numerical viability and robustness to stochastic inexactness under convex constraints.Overall, ProxSTORM provides a practical, theoretically backed approach for nonsmooth stochastic composite optimization with provable convergence guarantees and competitive empirical performance.

Abstract

We develop a stochastic trust-region algorithm for minimizing the sum of a possibly nonconvex Lipschitz-smooth function that can only be evaluated stochastically and a nonsmooth, deterministic, convex function. This algorithm, which we call ProxSTORM, generalizes STORM [1, 2] -- a stochastic trust-region algorithm for the unconstrained optimization of smooth functions -- and the inexact deterministic proximal trust-region algorithm in [3]. We generalize and, in some cases, simplify problem assumptions so that they reduce to more succinct version of assumptions on STORM when the convex term is zero. Our analysis follows the STORM framework by employing martingales, but again simplifies certain steps and proving global convergence and an expected complexity bound in the more general setting of a possibly nonsmooth term. To demonstrate that the method is numerically viable, we apply the algorithm to -regularized neural network training and also to topology optimization.

Paper Structure

This paper contains 18 sections, 16 theorems, 161 equations, 4 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

Let If Assumption a:m1 holds, then for all $s\in\mathbb{R}^n$ with $\|s\|\le \delta_k$.

Figures (4)

  • Figure 1: Partitioning the probability space $\Omega$ into events. Here, $I_k$ and $J_k$ pertain to assumptions about the models and estimates of $f$, respectively.
  • Figure 2: Test error for 100 realizations of ProxSTORM (blue) and Adam (red) applied to the $\ell^1$-regularized training example \ref{['eq:bc']} for the HIGGS dataset higgs.
  • Figure 3: Schematic of our topology example (left). The optimal design when approximating the $\xi$ by its mean (middle) is qualitatively different than the optimal design under uncertainty (right).
  • Figure 4: Test error for 100 realizations of ProxSTORM (blue) and the deterministic algorithm baraldi.2022 (red) applied to the topology example \ref{['eq:topopt2']}.

Theorems & Definitions (36)

  • Lemma 1: Accurate Model Gradient $\Rightarrow$ Accurate Predicted Reduction
  • proof
  • Proposition 2: Applicability of Assumption \ref{['a:new']}
  • proof
  • Theorem 3: Expected $\Psi$ Decrease
  • Proposition 4: Accurate Computed Reduction $\Rightarrow$ $\Psi$ Decrease
  • proof
  • Proposition 5: Accurate Model $\Rightarrow$ Bounded $\Psi$ Increase
  • proof
  • Proposition 6: Inaccurate Model and Computed Reduction $\Rightarrow$ Bounded $\Psi$ Increase
  • ...and 26 more