A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity

Serge Gratton; Sadok Jerad; Philippe L. Toint

A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity

Serge Gratton, Sadok Jerad, Philippe L. Toint

TL;DR

The paper introduces StOFFAR$p$, a fully stochastic, objective-function-free adaptive-regularization method for unconstrained nonconvex optimization. It builds a $p$th-order regularized Taylor model using inexact and stochastic derivatives whose accuracy is controlled by the history of past steps, avoiding any objective evaluations. The authors establish optimal evaluation-complexity bounds (notably $O(ε^{-3/2})$ for $p=2$) under a family of probabilistic derivative-error assumptions and provide practical sampling rules for inexact derivatives in finite-sum problems, with applications to large-scale machine-learning tasks. Numerical experiments on binary classification problems illustrate the potential of the approach, showing competitive performance against adaptive gradient baselines and highlighting the benefits and trade-offs of longer memory in the past-step framework.

Abstract

A fully stochastic second-order adaptive-regularization method for unconstrained nonconvex optimization is presented which never computes the objective-function value, but yet achieves the optimal $\mathcal{O}(ε^{-3/2})$ complexity bound for finding first-order critical points. The method is noise-tolerant and the inexactness conditions required for convergence depend on the history of past steps. Applications to cases where derivative evaluation is inexact and to minimization of finite sums by sampling are discussed. Numerical experiments on large binary classification problems illustrate the potential of the new method.

A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity

TL;DR

The paper introduces StOFFAR

, a fully stochastic, objective-function-free adaptive-regularization method for unconstrained nonconvex optimization. It builds a

th-order regularized Taylor model using inexact and stochastic derivatives whose accuracy is controlled by the history of past steps, avoiding any objective evaluations. The authors establish optimal evaluation-complexity bounds (notably

for

) under a family of probabilistic derivative-error assumptions and provide practical sampling rules for inexact derivatives in finite-sum problems, with applications to large-scale machine-learning tasks. Numerical experiments on binary classification problems illustrate the potential of the approach, showing competitive performance against adaptive gradient baselines and highlighting the benefits and trade-offs of longer memory in the past-step framework.

Abstract

A fully stochastic second-order adaptive-regularization method for unconstrained nonconvex optimization is presented which never computes the objective-function value, but yet achieves the optimal

complexity bound for finding first-order critical points. The method is noise-tolerant and the inexactness conditions required for convergence depend on the history of past steps. Applications to cases where derivative evaluation is inexact and to minimization of finite sums by sampling are discussed. Numerical experiments on large binary classification problems illustrate the potential of the new method.

Paper Structure (18 sections, 14 theorems, 59 equations, 3 figures, 1 table)

This paper contains 18 sections, 14 theorems, 59 equations, 3 figures, 1 table.

Introduction
A Stochastic OFFO adaptive regularization algorithm
Problem Formulation
The OFFO algorithm with stochastic derivatives
Evaluation complexity for the inexact StOFFAR$p$ algorithm
Applications of the StOFFAR$p$ algorithm
Inexact Derivatives
Inexact Derivatives
Machine Learning Problems
Numerical illustration
Implementation Issues
Results
Discussion
Proof of \ref{['skbound']}
Solutions of the equation $\gamma_1 \log(u) + \gamma_2u + \gamma_3 = 0$
...and 3 more sections

Key Result

Lemma 3.1

Suppose that AS.1, AS.3 and AS.5 hold and let $\alpha > 0$. Then and where

Figures (3)

Figure 1: Performance profile of OFFAR2-m for SUSY and w8a for $\texttt{m} \in \{1,50,250,500\}$ and WNGRAD
Figure 2: Evolution of the loss function w.r.t the epochs and number of samples along iterations for SUSY and w8a
Figure 3: Evolution of the loss function w.r.t the epochs and sampling behavior along iterations for specific problems

Theorems & Definitions (14)

Lemma 3.1
Lemma 3.2
Lemma 3.3
Lemma 3.4
Lemma 3.5
Lemma 3.6
Lemma 3.7
Theorem 3.8
Corollary 3.1
Corollary 3.2
...and 4 more

A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity

TL;DR

Abstract

A Stochastic Objective-Function-Free Adaptive Regularization Method with Optimal Complexity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (14)