First- and Second-Order Stochastic Adaptive Regularization with Cubics: High Probability Iteration and Sample Complexity
Katya Scheinberg, Miaolan Xie
TL;DR
This work tackles unconstrained nonconvex optimization where function values and derivatives are accessed through stochastic oracles. It develops two stochastic Adaptive Regularization with Cubics (SARC) algorithms, one first-order and one second-order, and proves that they achieve the deterministic $O(\varepsilon^{-3/2})$ iteration rate with high probability, while handling biased and arbitrary zeroth-order errors via an error-corrected acceptance rule. The analysis introduces true-iteration concepts, stochastic process framing, and tailored oracle-accuracy settings, establishing both high-probability and in-expectation iteration bounds, plus novel high-probability and expectation-based sample complexity results. The results show that SARC variants retain optimal iteration complexity and provide explicit, problem-dependent minibatch-based sample complexities for expectation minimization, highlighting their theoretical advantage over other stochastic adaptive methods.
Abstract
We present high-probability (and expectation) complexity bounds for two versions of stochastic adaptive regularization methods with cubics (SARC), also known as regularized Newton methods. The first algorithm aims to find first-order stationary points, while the second targets second-order optimality conditions. Both methods employ stochastic zeroth-, first-, and second-order oracles with specific accuracy and reliability requirements. These oracles, which have been previously used with other stochastic adaptive methods like trust-region and line-search algorithms, are applicable to various optimization settings including expected risk minimization and simulation optimization. In this paper, we establish the first high-probability iteration and sample complexity bounds for both first- and second-order SARC algorithms. Our analysis demonstrates that as in the deterministic case, they outperform other stochastic adaptive methods.
