Table of Contents
Fetching ...

Curvature-Aware Derivative-Free Optimization

Bumsu Kim, HanQin Cai, Daniel McKenzie, Wotao Yin

TL;DR

This paper addresses derivative-free optimization (DFO) in high dimensions where gradients are unavailable, introducing Curvature-Aware Random Search (CARS) that uses one-dimensional Newton-style updates along random directions with finite-difference estimates of first and second derivatives to compute a candidate step-size $\alpha_+$.A safeguarding mechanism ensures descent at every iteration, while a cubic-regularized variant (CARS-CR) extends the approach to general convex functions, achieving $\mathcal{O}(k^{-1})$ convergence under standard smoothness assumptions.The authors prove linear convergence in expectation for strongly convex objectives under mild sampling-distribution conditions and characterize the sampling-distribution requirements through $\eta(g,H;\mathcal{D})$ and $p_\gamma$, with concrete lower bounds for common isotropic distributions.Empirically, CARS and CARS-CR outperform state-of-the-art zeroth-order methods on convex and nonconvex benchmarks and demonstrate strong performance in black-box adversarial attacks, supported by open-source implementations.

Abstract

The paper discusses derivative-free optimization (DFO), which involves minimizing a function without access to gradients or directional derivatives, only function evaluations. Classical DFO methods, which mimic gradient-based methods, such as Nelder-Mead and direct search have limited scalability for high-dimensional problems. Zeroth-order methods have been gaining popularity due to the demands of large-scale machine learning applications, and the paper focuses on the selection of the step size $α_k$ in these methods. The proposed approach, called Curvature-Aware Random Search (CARS), uses first- and second-order finite difference approximations to compute a candidate $α_{+}$. We prove that for strongly convex objective functions, CARS converges linearly provided that the search direction is drawn from a distribution satisfying very mild conditions. We also present a Cubic Regularized variant of CARS, named CARS-CR, which converges in a rate of $\mathcal{O}(k^{-1})$ without the assumption of strong convexity. Numerical experiments show that CARS and CARS-CR match or exceed the state-of-the-arts on benchmark problem sets.

Curvature-Aware Derivative-Free Optimization

TL;DR

This paper addresses derivative-free optimization (DFO) in high dimensions where gradients are unavailable, introducing Curvature-Aware Random Search (CARS) that uses one-dimensional Newton-style updates along random directions with finite-difference estimates of first and second derivatives to compute a candidate step-size $\alpha_+$.A safeguarding mechanism ensures descent at every iteration, while a cubic-regularized variant (CARS-CR) extends the approach to general convex functions, achieving $\mathcal{O}(k^{-1})$ convergence under standard smoothness assumptions.The authors prove linear convergence in expectation for strongly convex objectives under mild sampling-distribution conditions and characterize the sampling-distribution requirements through $\eta(g,H;\mathcal{D})$ and $p_\gamma$, with concrete lower bounds for common isotropic distributions.Empirically, CARS and CARS-CR outperform state-of-the-art zeroth-order methods on convex and nonconvex benchmarks and demonstrate strong performance in black-box adversarial attacks, supported by open-source implementations.

Abstract

The paper discusses derivative-free optimization (DFO), which involves minimizing a function without access to gradients or directional derivatives, only function evaluations. Classical DFO methods, which mimic gradient-based methods, such as Nelder-Mead and direct search have limited scalability for high-dimensional problems. Zeroth-order methods have been gaining popularity due to the demands of large-scale machine learning applications, and the paper focuses on the selection of the step size in these methods. The proposed approach, called Curvature-Aware Random Search (CARS), uses first- and second-order finite difference approximations to compute a candidate . We prove that for strongly convex objective functions, CARS converges linearly provided that the search direction is drawn from a distribution satisfying very mild conditions. We also present a Cubic Regularized variant of CARS, named CARS-CR, which converges in a rate of without the assumption of strong convexity. Numerical experiments show that CARS and CARS-CR match or exceed the state-of-the-arts on benchmark problem sets.

Paper Structure

This paper contains 17 sections, 12 theorems, 88 equations, 3 figures, 2 tables, 2 algorithms.

Key Result

Lemma 1

If $f$ is $\mu$-strongly convex, then $f$ is $\hat{\mu}$-relatively convex and $\hat{L}$-relatively smooth for some $\hat{L} \geq \hat{\mu}>0$, i.e. for all $x, y \in \mathcal{Q}$

Figures (3)

  • Figure 1: Performance of each algorithm on a convex quartic function $f(x) = 0.1\sum_{i=1}^{d} x_i^4 + \frac{1}{2}x^{\top}Ax + 0.01\|x\|^2$, where $A = G^{\top}G$ with $G_{ij} \stackrel{i.i.d}{\sim} \mathcal{N}(0, 1)$. The problem dimension $d = 30$.
  • Figure 2: Performance profiles on Moré-Garbow-Hillstrom problems (upper) and CUTEst problems (lower), for various target accuracies $\varepsilon = 10^{-1}$ (left), $10^{-3}$ (middle), and $10^{-5}$ (right). Our results demonstrate that CARS and CARS-CR consistently outperform other methods in terms of both efficiency ($\rho$ at low $\tau$ values) and robustness ($\rho$ at high $\tau$ values.) at all levels of accuracy.
  • Figure 3: Adversarial examples with misclassified labels on MNIST generated with CARS.

Theorems & Definitions (27)

  • Definition 1
  • Definition 2
  • Lemma 1: $\hat{L}$-Relative Smoothness and $\hat{\mu}$-Relative Convexity
  • Theorem 2: Expected descent of CARS
  • Corollary 3: Convergence of CARS
  • Lemma 4
  • proof
  • Lemma 5: Estimation and Lower Bounds of $p_{\gamma}$ for Various Distributions
  • proof
  • Corollary 6
  • ...and 17 more