Table of Contents
Fetching ...

Negative Curvature Methods with High-Probability Complexity Guarantees for Stochastic Nonconvex Optimization

Albert S. Berahas, Raghu Bollapragada, Wanping Dong

Abstract

This paper develops negative curvature methods for continuous nonlinear unconstrained optimization in stochastic settings, in which function, gradient, and Hessian information is available only through probabilistic oracles, i.e., oracles that return approximations of a certain accuracy and reliability. We introduce conditions on these oracles and design a two-step framework that systematically combines gradient and negative curvature steps. The framework employs an early-stopping mechanism to guarantee sufficient progress and uses an adaptive mechanism based on an Armijo-type criterion to select the step sizes for both steps. We establish high-probability iteration-complexity guarantees for attaining second-order stationary points, deriving explicit tail bounds that quantify the convergence neighborhood and its dependence on oracle noise. Importantly, these bounds match deterministic rates up to noise-dependent terms, and the framework recovers the deterministic results as a special case. Finally, numerical experiments demonstrate the practical benefits of exploiting negative curvature directions even in the presence of noise.

Negative Curvature Methods with High-Probability Complexity Guarantees for Stochastic Nonconvex Optimization

Abstract

This paper develops negative curvature methods for continuous nonlinear unconstrained optimization in stochastic settings, in which function, gradient, and Hessian information is available only through probabilistic oracles, i.e., oracles that return approximations of a certain accuracy and reliability. We introduce conditions on these oracles and design a two-step framework that systematically combines gradient and negative curvature steps. The framework employs an early-stopping mechanism to guarantee sufficient progress and uses an adaptive mechanism based on an Armijo-type criterion to select the step sizes for both steps. We establish high-probability iteration-complexity guarantees for attaining second-order stationary points, deriving explicit tail bounds that quantify the convergence neighborhood and its dependence on oracle noise. Importantly, these bounds match deterministic rates up to noise-dependent terms, and the framework recovers the deterministic results as a special case. Finally, numerical experiments demonstrate the practical benefits of exploiting negative curvature directions even in the presence of noise.
Paper Structure (11 sections, 13 theorems, 92 equations, 3 figures, 1 algorithm)

This paper contains 11 sections, 13 theorems, 92 equations, 3 figures, 1 algorithm.

Key Result

Lemma 2.6

Let $p_f = 1 - 3 \exp(-a(\tfrac{e_f}{2} - \epsilon_f) )$, where $a$ is the positive constant from Oracle def.0th-order_orc.orc.0th-order_subexp_orc. Then, the indicators variables $I_k^f$ and $\hat{I}_k^f$ satisfy the submartingale condition, i.e., $\mathbb{P}\left[I_k^f = 1 | \mathcal{F}_{k-1}' \ri

Figures (3)

  • Figure 1: Sensitivity of Algorithm \ref{['alg.theory']} on the Rosenbrock problem for $\epsilon_f \in \{10^{-2}, 10^{-3}, 10^{-5}, 10^{-8}, 0\}$ with $e_f = 2\epsilon_f$. Top row: metrics vs. iterations. Bottom row: contour plots with trajectories (after 100, 200, 2000, and 20000 iterations).
  • Figure 2: Sensitivity of Algorithm \ref{['alg.theory']} on the Rosenbrock problem for $\epsilon_f = 10^{-3}$ and $e_f \in \{0.25\epsilon_f, 2\epsilon_f, 16\epsilon_f, 128\epsilon_f\}$. Results averaged over 10 runs. Top row: metrics vs. iterations. Bottom row: contour plots with trajectories (after 100, 200, 1000, and 5000 iterations).
  • Figure 3: Comparison of Algorithm \ref{['alg.theory']} (SS2-NC-G), SS-G, and SS-NC-CG on the Rosenbrock problem with $\epsilon_f = 10^{-3}$ and $e_f = 2\epsilon_f$. Top row: metrics vs. function evaluations. Bottom row: contour plots with iterate trajectories (after 100, 500, 1000, and 5000 iterations).

Theorems & Definitions (32)

  • Remark 2.2
  • Remark 2.3
  • Definition 1
  • Remark 2.4
  • Remark 2.5
  • Lemma 2.6
  • proof
  • Proposition 2.7
  • Lemma 2.8
  • proof
  • ...and 22 more