Table of Contents
Fetching ...

A Gradient Sampling Algorithm for Noisy Nonsmooth Optimization

Albert S. Berahas, Frank E. Curtis, Lara Zebiane

Abstract

An algorithm is proposed, analyzed, and tested for minimizing locally Lipschitz objective functions that may be nonconvex and/or nonsmooth. The algorithm, which is built upon the gradient-sampling methodology, is designed specifically for cases when objective function and generalized gradient values might be subject to bounded uncontrollable errors. Similarly to state-of-the-art guarantees for noisy smooth optimization of this kind, it is proved for the algorithm that, with probability one, either the sequence of objective function values will decrease without bound or the algorithm will generate an iterate at which a measure of stationarity is below a threshold that depends proportionally on the error bounds for the objective function and generalized gradient values. The results of numerical experiments are presented, which show that the algorithm can indeed perform approximate optimization robustly despite errors in objective and generalized gradient values.

A Gradient Sampling Algorithm for Noisy Nonsmooth Optimization

Abstract

An algorithm is proposed, analyzed, and tested for minimizing locally Lipschitz objective functions that may be nonconvex and/or nonsmooth. The algorithm, which is built upon the gradient-sampling methodology, is designed specifically for cases when objective function and generalized gradient values might be subject to bounded uncontrollable errors. Similarly to state-of-the-art guarantees for noisy smooth optimization of this kind, it is proved for the algorithm that, with probability one, either the sequence of objective function values will decrease without bound or the algorithm will generate an iterate at which a measure of stationarity is below a threshold that depends proportionally on the error bounds for the objective function and generalized gradient values. The results of numerical experiments are presented, which show that the algorithm can indeed perform approximate optimization robustly despite errors in objective and generalized gradient values.

Paper Structure

This paper contains 14 sections, 5 theorems, 46 equations, 6 figures, 1 algorithm.

Key Result

Lemma C.1

Suppose that ${\cal G} \subset \mathbb{R}^{n}$ is nonempty, convex, and compact, and that $\|g\|_2> 3\epsilon_g$ for all $g\in{\cal G}$. Let $\tilde{{\cal G}}$ be any nonempty, convex, and compact set such that $\mathop{\mathrm{dist}}\nolimits_{\cal G}(\tilde{g})\leq\epsilon_g$ for all $\tilde{g}\in

Figures (6)

  • Figure 3: On the left, the absolute value function and its corresponding generalized-gradient/subgradient mapping. On the right, the same mappings subject to bounded errors. We emphasize that, typically in noisy optimization, the mapping $\tilde{g}$ does not correspond to the derivative function of $\tilde{f}$; rather, one can expect that $\tilde{g}$ approximates $g$ directly, as shown in the graphs. That being said, the graphs show that for Assumption \ref{['ass.g']} to hold for this case, one clearly needs $\epsilon_g$ greater than the lower bound stated in \ref{['eq.g_sup']}; see, e.g., $x$ slightly to the right of the origin.
  • Figure 4: Surfaces of the nonsmooth Rosenbrock function \ref{['eq.rosenbrock']}. Left: the true surface near the global minimizer. Right: the surface corrupted by uniformly distributed noise with $\epsilon_f=1$.
  • Figure 5: Rosenbrock tests of the noisy gradient algorithm across four noise levels $\epsilon_f$.
  • Figure 6: Iterate trajectories for different noise levels $\epsilon_f$.
  • Figure 7: Trade-off between total samples used throughout the optimization process (i.e., computational cost) and final classification accuracy as a function of the noise level.
  • ...and 1 more figures

Theorems & Definitions (11)

  • Example B.1
  • Lemma C.1
  • proof
  • Lemma C.2
  • proof
  • Lemma C.3
  • proof
  • Lemma C.4
  • proof
  • Theorem C.1
  • ...and 1 more