Table of Contents
Fetching ...

First- and Second-Order Stochastic Adaptive Regularization with Cubics: High Probability Iteration and Sample Complexity

Katya Scheinberg, Miaolan Xie

TL;DR

This work tackles unconstrained nonconvex optimization where function values and derivatives are accessed through stochastic oracles. It develops two stochastic Adaptive Regularization with Cubics (SARC) algorithms, one first-order and one second-order, and proves that they achieve the deterministic $O(\varepsilon^{-3/2})$ iteration rate with high probability, while handling biased and arbitrary zeroth-order errors via an error-corrected acceptance rule. The analysis introduces true-iteration concepts, stochastic process framing, and tailored oracle-accuracy settings, establishing both high-probability and in-expectation iteration bounds, plus novel high-probability and expectation-based sample complexity results. The results show that SARC variants retain optimal iteration complexity and provide explicit, problem-dependent minibatch-based sample complexities for expectation minimization, highlighting their theoretical advantage over other stochastic adaptive methods.

Abstract

We present high-probability (and expectation) complexity bounds for two versions of stochastic adaptive regularization methods with cubics (SARC), also known as regularized Newton methods. The first algorithm aims to find first-order stationary points, while the second targets second-order optimality conditions. Both methods employ stochastic zeroth-, first-, and second-order oracles with specific accuracy and reliability requirements. These oracles, which have been previously used with other stochastic adaptive methods like trust-region and line-search algorithms, are applicable to various optimization settings including expected risk minimization and simulation optimization. In this paper, we establish the first high-probability iteration and sample complexity bounds for both first- and second-order SARC algorithms. Our analysis demonstrates that as in the deterministic case, they outperform other stochastic adaptive methods.

First- and Second-Order Stochastic Adaptive Regularization with Cubics: High Probability Iteration and Sample Complexity

TL;DR

This work tackles unconstrained nonconvex optimization where function values and derivatives are accessed through stochastic oracles. It develops two stochastic Adaptive Regularization with Cubics (SARC) algorithms, one first-order and one second-order, and proves that they achieve the deterministic iteration rate with high probability, while handling biased and arbitrary zeroth-order errors via an error-corrected acceptance rule. The analysis introduces true-iteration concepts, stochastic process framing, and tailored oracle-accuracy settings, establishing both high-probability and in-expectation iteration bounds, plus novel high-probability and expectation-based sample complexity results. The results show that SARC variants retain optimal iteration complexity and provide explicit, problem-dependent minibatch-based sample complexities for expectation minimization, highlighting their theoretical advantage over other stochastic adaptive methods.

Abstract

We present high-probability (and expectation) complexity bounds for two versions of stochastic adaptive regularization methods with cubics (SARC), also known as regularized Newton methods. The first algorithm aims to find first-order stationary points, while the second targets second-order optimality conditions. Both methods employ stochastic zeroth-, first-, and second-order oracles with specific accuracy and reliability requirements. These oracles, which have been previously used with other stochastic adaptive methods like trust-region and line-search algorithms, are applicable to various optimization settings including expected risk minimization and simulation optimization. In this paper, we establish the first high-probability iteration and sample complexity bounds for both first- and second-order SARC algorithms. Our analysis demonstrates that as in the deterministic case, they outperform other stochastic adaptive methods.
Paper Structure (9 sections, 19 theorems, 80 equations)

This paper contains 9 sections, 19 theorems, 80 equations.

Key Result

Lemma 1

Consider any realization of Algorithm alg:ARC_Random. For each iteration $k$, we have On every successful iteration $k$, we have which implies

Theorems & Definitions (41)

  • Definition 1: True iteration
  • Remark 1
  • Lemma 1: Improvement on successful iterations
  • proof
  • Lemma 2: Large $\sigma_k$ guarantees success or small step
  • proof
  • Lemma 3: Lower bound on step norm in terms of $\|\nabla \phi(x_k^+)\|$
  • proof
  • Lemma 4: Lower bound on step norm until $\epsilon$-accuracy is reached
  • proof
  • ...and 31 more