Table of Contents
Fetching ...

A multilevel stochastic regularized first-order method with application to finite sum minimization

Filippo Marini, Margherita Porcelli, Elisa Riccietti

TL;DR

This work addresses large-scale stochastic optimization by introducing MU$^{\ell}$STREG, a multilevel stochastic adaptive-regularization gradient method that builds a hierarchy of computable approximations in either the variable space or the function space. By alternating fine stochastic steps with cheaper coarse steps, the method reduces iteration cost while maintaining convergence guarantees; the authors prove almost-sure convergence to a first-order stationary point under probabilistic accuracy assumptions on models and estimates. The framework extends deterministic multilevel ideas and STORM to a stochastic setting with hierarchies that do not require exact finest-level matching to the original objective throughout optimization, and it specializes to finite-sum minimization with hierarchical subsampling. Numerical experiments on binary classification show MU$^{\ell}$STREG outperforms one-level variants and competes with SVRG and Adagrad, highlighting practical impact for scalable learning tasks where full-data passes are expensive.

Abstract

In this paper, we propose a multilevel stochastic framework for the solution of nonconvex unconstrained optimization problems. The proposed approach uses random regularized first-order models that exploit an available hierarchical description of the problem, being either in the classical variable space or in the function space, meaning that different levels of accuracy for the objective function are available. We propose a convergence analysis showing an almost sure global convergence of the method to a first order stationary point. The numerical behavior is tested on the solution of finite sum minimization problems. Differently from classical deterministic multilevel schemes, our stochastic method does not require the finest approximation to coincide with the original objective function along all the optimization process. This allows for significantly decreasing their cost, for instance in data-fitting problems, where considering all the data at each iteration can be avoided.

A multilevel stochastic regularized first-order method with application to finite sum minimization

TL;DR

This work addresses large-scale stochastic optimization by introducing MUSTREG, a multilevel stochastic adaptive-regularization gradient method that builds a hierarchy of computable approximations in either the variable space or the function space. By alternating fine stochastic steps with cheaper coarse steps, the method reduces iteration cost while maintaining convergence guarantees; the authors prove almost-sure convergence to a first-order stationary point under probabilistic accuracy assumptions on models and estimates. The framework extends deterministic multilevel ideas and STORM to a stochastic setting with hierarchies that do not require exact finest-level matching to the original objective throughout optimization, and it specializes to finite-sum minimization with hierarchical subsampling. Numerical experiments on binary classification show MUSTREG outperforms one-level variants and competes with SVRG and Adagrad, highlighting practical impact for scalable learning tasks where full-data passes are expensive.

Abstract

In this paper, we propose a multilevel stochastic framework for the solution of nonconvex unconstrained optimization problems. The proposed approach uses random regularized first-order models that exploit an available hierarchical description of the problem, being either in the classical variable space or in the function space, meaning that different levels of accuracy for the objective function are available. We propose a convergence analysis showing an almost sure global convergence of the method to a first order stationary point. The numerical behavior is tested on the solution of finite sum minimization problems. Differently from classical deterministic multilevel schemes, our stochastic method does not require the finest approximation to coincide with the original objective function along all the optimization process. This allows for significantly decreasing their cost, for instance in data-fitting problems, where considering all the data at each iteration can be avoided.

Paper Structure

This paper contains 18 sections, 13 theorems, 80 equations, 5 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

Let $h:\mathbb{R}^n\rightarrow\mathbb{R}$ be a continuously differentiable function with Lipschitz continuous gradient, with $L$ the corresponding Lipschitz constant. Given its first order truncated Taylor series in $x$, $T[h](s) := h(x)+\nabla_x h(x)^Ts$, it holds:

Figures (5)

  • Figure 1: Sketch of a possible iteration scheme for MU$^\ell$STREG. Horizontal arrows represent fine steps.
  • Figure 2: Iteration scheme used in our implementation of MU$^\ell$STREG for problem \ref{['pb_opti_fs']}.
  • Figure 3: \ref{['prob: nonlin least squares']} Cardinality of the sample set at the finest level for MU$^1$STREG and MU$^3$STREG and the full size $N$ along the iterations.
  • Figure 4: \ref{['prob: nonlin least squares']} Objective function value along the number of weighted evaluations of gradients and functions.
  • Figure 5: \ref{['prob: nonlin least squares']} Classification accuracy on the testing set along the number of weighted evaluations of gradients and functions.

Theorems & Definitions (29)

  • Example 1
  • Example 2
  • Remark 1
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Lemma 1
  • Lemma 2
  • proof
  • ...and 19 more