Table of Contents
Fetching ...

History-Aware Adaptive High-Order Tensor Regularization

Chang He, Bo Jiang, Yuntian Jiang, Chuwen Zhang, Shuzhong Zhang

TL;DR

The paper addresses the challenge of selecting adaptive regularization parameters in high-order tensor methods for composite optimization without a known Lipschitz constant. It introduces History-aware Adaptive Regularization (HAR), which uses a history of local $p$th-order Lipschitz estimates to set the current regularization parameter, achieving complexity guarantees comparable to standard methods that assume a known Lipschitz constant. The authors establish convex and nonconvex iteration complexities, propose practical variants HAR-C and HAR-S with budgeted history, and develop HAR-A for accelerated convex optimization. They also provide extensive numerical experiments on convex problems and the CUTEst dataset, demonstrating the practical effectiveness of partial-history variants and underscoring the method's potential for real-world high-order optimization tasks.

Abstract

In this paper, we develop a new adaptive regularization method for minimizing a composite function, which is the sum of a $p$th-order ($p \ge 1$) Lipschitz continuous function and a simple, convex, and possibly nonsmooth function. We use a history of local Lipschitz estimates to adaptively select the current regularization parameter, an approach we shall term the {\it history-aware adaptive regularization method}. We explore how the selection of an appropriate volume of historical information affects both the theoretical and practical performance. By using all the historical information, our method matches the complexity guarantees of the standard $p$th-order tensor methods that require a known Lipschitz constant, for both convex and nonconvex objectives. In the nonconvex case, the number of iterations required to find an $(ε_g,ε_H)$-approximate second-order stationary point is bounded by $\mathcal{O}(\max\{ε_g^{-(p+1)/p}, ε_H^{-(p+1)/(p-1)}\})$. For convex functions, we establish an $\mathcal{O}(ε^{-1/p})$ iteration complexity for finding an $ε$-approximate optimal point and further propose an accelerated variant attaining an iteration complexity of $\mathcal{O}(ε^{-1/(p+1)})$. For practical consideration, we propose several variants of this method with only part of historical information. We introduce cyclic and sliding-window strategies for choosing historical Lipschitz estimates, which mitigate the limitation of overly conservative updates. As long as a rough upper bound of the Lipschitz constant is known, these two variants achieve the same iteration complexity guarantees in terms of the input accuracy as the method using full historical information. Finally, extensive numerical experiments are conducted to demonstrate the effectiveness of our adaptive approach.

History-Aware Adaptive High-Order Tensor Regularization

TL;DR

The paper addresses the challenge of selecting adaptive regularization parameters in high-order tensor methods for composite optimization without a known Lipschitz constant. It introduces History-aware Adaptive Regularization (HAR), which uses a history of local th-order Lipschitz estimates to set the current regularization parameter, achieving complexity guarantees comparable to standard methods that assume a known Lipschitz constant. The authors establish convex and nonconvex iteration complexities, propose practical variants HAR-C and HAR-S with budgeted history, and develop HAR-A for accelerated convex optimization. They also provide extensive numerical experiments on convex problems and the CUTEst dataset, demonstrating the practical effectiveness of partial-history variants and underscoring the method's potential for real-world high-order optimization tasks.

Abstract

In this paper, we develop a new adaptive regularization method for minimizing a composite function, which is the sum of a th-order () Lipschitz continuous function and a simple, convex, and possibly nonsmooth function. We use a history of local Lipschitz estimates to adaptively select the current regularization parameter, an approach we shall term the {\it history-aware adaptive regularization method}. We explore how the selection of an appropriate volume of historical information affects both the theoretical and practical performance. By using all the historical information, our method matches the complexity guarantees of the standard th-order tensor methods that require a known Lipschitz constant, for both convex and nonconvex objectives. In the nonconvex case, the number of iterations required to find an -approximate second-order stationary point is bounded by . For convex functions, we establish an iteration complexity for finding an -approximate optimal point and further propose an accelerated variant attaining an iteration complexity of . For practical consideration, we propose several variants of this method with only part of historical information. We introduce cyclic and sliding-window strategies for choosing historical Lipschitz estimates, which mitigate the limitation of overly conservative updates. As long as a rough upper bound of the Lipschitz constant is known, these two variants achieve the same iteration complexity guarantees in terms of the input accuracy as the method using full historical information. Finally, extensive numerical experiments are conducted to demonstrate the effectiveness of our adaptive approach.

Paper Structure

This paper contains 16 sections, 18 theorems, 102 equations, 2 figures, 1 table, 4 algorithms.

Key Result

Lemma 3.1

Suppose that function $f$ satisfies eq:Lipchitz continuous. The cardinality of the unsuccessful index set $\mathcal{U}$ in Algorithm alg:ac pth-order method satisfies where $H_{\max} = \max \{H_0, L_p\}$.

Figures (2)

  • Figure 1: Logistic regression using the LIBSVM datasets. The number in the bracket means the budget size $\mathcal{B}$.
  • Figure 2: Performance profiles of selected CUTEst instances. Panel (a) shows gradient evaluations; panel (b) shows Hessian evaluations.

Theorems & Definitions (35)

  • Definition 2.1
  • Definition 2.2
  • Lemma 3.1
  • proof
  • Theorem 3.1
  • proof
  • Corollary 3.1
  • Lemma 3.2
  • proof
  • Theorem 3.2
  • ...and 25 more