Table of Contents
Fetching ...

A randomized algorithm for nonconvex minimization with inexact evaluations and complexity guarantees

Shuyao Li, Stephen J. Wright

TL;DR

This work considers minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian to achieve approximate second-order optimality and applies its algorithm to empirical risk minimization problems and obtains improved gradient sample complexity over existing works.

Abstract

We consider minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian (without assuming access to the function value) to achieve approximate second-order optimality. A novel feature of our method is that if an approximate direction of negative curvature is chosen as the step, we choose its sense to be positive or negative with equal probability. We allow gradients to be inexact in a relative sense and relax the coupling between inexactness thresholds for the first- and second-order optimality conditions. Our convergence analysis includes both an expectation bound based on martingale analysis and a high-probability bound based on concentration inequalities. We apply our algorithm to empirical risk minimization problems and obtain improved gradient sample complexity over existing works.

A randomized algorithm for nonconvex minimization with inexact evaluations and complexity guarantees

TL;DR

This work considers minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian to achieve approximate second-order optimality and applies its algorithm to empirical risk minimization problems and obtains improved gradient sample complexity over existing works.

Abstract

We consider minimization of a smooth nonconvex function with inexact oracle access to gradient and Hessian (without assuming access to the function value) to achieve approximate second-order optimality. A novel feature of our method is that if an approximate direction of negative curvature is chosen as the step, we choose its sense to be positive or negative with equal probability. We allow gradients to be inexact in a relative sense and relax the coupling between inexactness thresholds for the first- and second-order optimality conditions. Our convergence analysis includes both an expectation bound based on martingale analysis and a high-probability bound based on concentration inequalities. We apply our algorithm to empirical risk minimization problems and obtain improved gradient sample complexity over existing works.
Paper Structure (15 sections, 10 theorems, 60 equations, 1 algorithm)

This paper contains 15 sections, 10 theorems, 60 equations, 1 algorithm.

Key Result

Proposition 2.5

If Algorithm alg:inexact_randomized terminates and returns $x_{n}$, then $x_{n}$ is an $(\frac{4}{3}\epsilon_g, \frac{4}{3}\epsilon_{H})$ approximate second-order stationary point.

Theorems & Definitions (29)

  • Definition 1.1: Lipschitz continuity
  • Proposition 2.5
  • proof
  • Theorem 2.6
  • proof : Proof of Theorem \ref{['thm:expected_complexity']}
  • Remark 2.10
  • Theorem 2.11
  • Corollary 2.12: Short-Step Negative Curvature Updates
  • proof
  • Corollary 2.13
  • ...and 19 more