Exploiting higher-order derivatives in convex optimization methods

Dmitry Kamzolov; Alexander Gasnikov; Pavel Dvurechensky; Artem Agafonov; Martin Takáč

Exploiting higher-order derivatives in convex optimization methods

Dmitry Kamzolov, Alexander Gasnikov, Pavel Dvurechensky, Artem Agafonov, Martin Takáč

TL;DR

A series of lower iteration complexity bounds for higher-order derivatives in convex optimization were proved, and a gap between upper an lower complexity bounds was revealed, allowing to propose a second-order method with convergence rate 1/k^5, which is faster than the rate of existing second- order methods.

Abstract

Exploiting higher-order derivatives in convex optimization is known at least since 1970's. In each iteration higher-order (also called tensor) methods minimize a regularized Taylor expansion of the objective function, which leads to faster convergence rates if the corresponding higher-order derivative is Lipschitz-continuous. Recently a series of lower iteration complexity bounds for such methods were proved, and a gap between upper an lower complexity bounds was revealed. Moreover, it was shown that such methods can be implementable since the appropriately regularized Taylor expansion of a convex function is also convex and, thus, can be minimized in polynomial time. Only very recently an algorithm with optimal convergence rate $1/k^{(3p+1)/2}$ was proposed for minimizing convex functions with Lipschitz $p$-th derivative. For convex functions with Lipschitz third derivative, these developments allowed to propose a second-order method with convergence rate $1/k^5$, which is faster than the rate $1/k^{3.5}$ of existing second-order methods.

Exploiting higher-order derivatives in convex optimization methods

TL;DR

Abstract

was proposed for minimizing convex functions with Lipschitz

-th derivative. For convex functions with Lipschitz third derivative, these developments allowed to propose a second-order method with convergence rate

, which is faster than the rate

of existing second-order methods.

Paper Structure (10 sections, 3 theorems, 36 equations, 3 algorithms)

This paper contains 10 sections, 3 theorems, 36 equations, 3 algorithms.

Introduction
Notation and problem statement
Optimal Tensor methods
Superfast acceleration and the structure of the auxiliary problem
Tensor methods and stochastic distributed setup
Stochastic Optimization
Distributed optimization
Conclusion
MC codes
Cross‐References

Key Result

Theorem 1

gasnikov2021accelerated Let $y_k$ be an output point of Algorithm alg:highorder MSN($x_0$, $f$, $g$, $p$, $H$, $k$) after $k$ iterations, when $p\geq 1$ and $H\ge (p+1)L_{p,f}$. Then where $c_p = 2^{p-1} (p+1)^{\frac{3p+1}{2}} / p!$, $R=\|x_0 - x^{\ast}\|$. Moreover, when $p \ge 2$ for $\varepsilon$: $F(y_k) - F(x_{\ast}) \leq \varepsilon$ it is required to solve auxiliary problem prox_step, to f

Theorems & Definitions (4)

Theorem 1
Theorem 2
Definition 3
Theorem 4

Exploiting higher-order derivatives in convex optimization methods

TL;DR

Abstract

Exploiting higher-order derivatives in convex optimization methods

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (4)