Local Convergence of Adaptively Regularized Tensor Methods

Karl Welzel; Yang Liu; Raphael A. Hauser; Coralia Cartis

Local Convergence of Adaptively Regularized Tensor Methods

Karl Welzel, Yang Liu, Raphael A. Hauser, Coralia Cartis

TL;DR

It is confirmed that adaptive higher-order methods achieve superlinear convergence for certain degenerate problems as long as $p$ is large enough and provide sharp bounds on the order of convergence one can expect in the limit.

Abstract

Optimization methods that make use of derivatives of the objective function up to order $p > 2$ are called tensor methods. Among them, ones that minimize a regularized $p$th-order Taylor expansion at each step have been shown to possess optimal global complexity, which improves as $p$ increases. The local convergence of such optimization algorithms on functions that have Lipschitz continuous $p$th derivatives and are uniformly convex of order $q$ has been studied by Doikov and Nesterov [Math. Program., 193 (2022), pp. 315--336]. We extend these local convergence results to locally uniformly convex functions and fully adaptive methods, which do not need knowledge of the Lipschitz constant, thus providing the first sharp local rates for AR$p$. We discuss the surprising new challenges encountered by nonconvex local models and non-unique model minimizers. For $p > 2$, our examples show that in particular when using the global minimizer of the subproblem, even asymptotically not all iterations need to be successful. Only if the "right" local model minimizer is used, the $p/(q-1)$th-order local convergence from the non-adaptive case is preserved for $p > q-1$, otherwise the superlinear rate can degrade. We thus confirm that adaptive higher-order methods achieve superlinear convergence for certain degenerate problems as long as $p$ is large enough and provide sharp bounds on the order of convergence one can expect in the limit.

Local Convergence of Adaptively Regularized Tensor Methods

TL;DR

It is confirmed that adaptive higher-order methods achieve superlinear convergence for certain degenerate problems as long as

is large enough and provide sharp bounds on the order of convergence one can expect in the limit.

Abstract

Optimization methods that make use of derivatives of the objective function up to order

are called tensor methods. Among them, ones that minimize a regularized

th-order Taylor expansion at each step have been shown to possess optimal global complexity, which improves as

increases. The local convergence of such optimization algorithms on functions that have Lipschitz continuous

th derivatives and are uniformly convex of order

has been studied by Doikov and Nesterov [Math. Program., 193 (2022), pp. 315--336]. We extend these local convergence results to locally uniformly convex functions and fully adaptive methods, which do not need knowledge of the Lipschitz constant, thus providing the first sharp local rates for AR

. We discuss the surprising new challenges encountered by nonconvex local models and non-unique model minimizers. For

, our examples show that in particular when using the global minimizer of the subproblem, even asymptotically not all iterations need to be successful. Only if the "right" local model minimizer is used, the

th-order local convergence from the non-adaptive case is preserved for

, otherwise the superlinear rate can degrade. We thus confirm that adaptive higher-order methods achieve superlinear convergence for certain degenerate problems as long as

is large enough and provide sharp bounds on the order of convergence one can expect in the limit.

Local Convergence of Adaptively Regularized Tensor Methods

TL;DR

Abstract

Local Convergence of Adaptively Regularized Tensor Methods

TL;DR

Abstract

Paper Structure