Table of Contents
Fetching ...

On the Complexity of Lower-Order Implementations of Higher-Order Methods

Nikita Doikov, Geovani Nunes Grapiglia

TL;DR

The paper addresses non-convex optimization of a $p$-times differentiable function with Lipschitz continuous $p$th-order derivative by developing a lazy, lower-order method of order $(p-1)$ that uses finite-difference approximations of the $p$th derivative. By reusing a single $p$-th derivative approximation for up to $m$ iterations and adaptively tuning Lipschitz constants, the method achieves $O\left(\epsilon^{-\frac{p+1}{p}}\right)$ iterations and, with $m=(p-1)n+1$, requires $O\left(n^{1/p}\epsilon^{-\frac{p+1}{p}}\right)$ calls to the $(p-1)$st-order oracle to reach $\epsilon$-stationarity. The approach extends lazy-update ideas beyond the Hessian case, providing a dimension-efficient, adaptive framework for higher-order smoothness without full $p$th-order information at every step. This yields a practical and theoretically sharp performance improvement over prior finite-difference tensor methods in high dimensions. The work opens avenues for further refinements, such as quasi-tensor updates and universal lower-order schemes for convex and nonconvex settings.

Abstract

In this work, we propose a method for minimizing non-convex functions with Lipschitz continuous $p$th-order derivatives, starting from $p \geq 1$. The method, however, only requires derivative information up to order $(p-1)$, since the $p$th-order derivatives are approximated via finite differences. To ensure oracle efficiency, instead of computing finite-difference approximations at every iteration, we reuse each approximation for $m$ consecutive iterations before recomputing it, with $m \geq 1$ as a key parameter. As a result, we obtain an adaptive method of order $(p-1)$ that requires no more than $O(ε^{-\frac{p+1}{p}})$ iterations to find an $ε$-approximate stationary point of the objective function and that, for the choice $m=(p-1)n + 1$, where $n$ is the problem dimension, takes no more than $O(n^{1/p}ε^{-\frac{p+1}{p}})$ oracle calls of order $(p-1)$. This improves previously known bounds for tensor methods with finite-difference approximations in terms of the problem dimension.

On the Complexity of Lower-Order Implementations of Higher-Order Methods

TL;DR

The paper addresses non-convex optimization of a -times differentiable function with Lipschitz continuous th-order derivative by developing a lazy, lower-order method of order that uses finite-difference approximations of the th derivative. By reusing a single -th derivative approximation for up to iterations and adaptively tuning Lipschitz constants, the method achieves iterations and, with , requires calls to the st-order oracle to reach -stationarity. The approach extends lazy-update ideas beyond the Hessian case, providing a dimension-efficient, adaptive framework for higher-order smoothness without full th-order information at every step. This yields a practical and theoretically sharp performance improvement over prior finite-difference tensor methods in high dimensions. The work opens avenues for further refinements, such as quasi-tensor updates and universal lower-order schemes for convex and nonconvex settings.

Abstract

In this work, we propose a method for minimizing non-convex functions with Lipschitz continuous th-order derivatives, starting from . The method, however, only requires derivative information up to order , since the th-order derivatives are approximated via finite differences. To ensure oracle efficiency, instead of computing finite-difference approximations at every iteration, we reuse each approximation for consecutive iterations before recomputing it, with as a key parameter. As a result, we obtain an adaptive method of order that requires no more than iterations to find an -approximate stationary point of the objective function and that, for the choice , where is the problem dimension, takes no more than oracle calls of order . This improves previously known bounds for tensor methods with finite-difference approximations in terms of the problem dimension.

Paper Structure

This paper contains 8 sections, 12 theorems, 99 equations, 1 table, 2 algorithms.

Key Result

Lemma 2.1

Suppose that Assumption A1 holds and let $x^{+}$ be an inexact minimizer of $M_{\bar{x},\sigma,p}(\,\cdot\,)$, defined in (ModelDef), satisfying the following condition If $\sigma\geq 2L$ and, for some $z\in\mathbb{R}^{n}$ and $\delta>0$, we have then

Theorems & Definitions (24)

  • Lemma 2.1
  • proof
  • Lemma 2.2
  • proof
  • Lemma 2.3
  • proof
  • Lemma 2.4
  • proof
  • Remark 2.5
  • Lemma 2.6
  • ...and 14 more