Table of Contents
Fetching ...

Inexact Adaptive Cubic Regularization Algorithms on Riemannian Manifolds and Application

Z. Y. Li, X. M. Wang

TL;DR

This work develops an inexact Riemannian adaptive cubic regularization framework for large-scale, separable optimization on general manifolds by employing sub-sampled gradient and Hessian information. The SSRACR algorithm solves a cubic-regularized subproblem approximately at each iteration and updates a cubic parameter to ensure descent, with proven iteration complexity of $\mathcal{O}(\max\{\varepsilon_g^{-2},\varepsilon_H^{-3}\})$ to achieve $(\varepsilon_g,\varepsilon_H)$-optimality under specified accuracy assumptions. The analysis includes deterministic and true-gradient corollaries and discusses practical solution criteria (Cauchy/Eigenpoint) in the Euclidean subproblem. As an application, the method is tested on joint diagonalization on the Stiefel manifold, where SSRACR outperforms inexact trust-region methods in terms of efficiency and CPU time, illustrating its practical impact for large-scale Riemannian optimization problems.

Abstract

The adaptive cubic regularization algorithm employing the inexact gradient and Hessian is proposed on general Riemannian manifolds, together with the iteration complexity to get an approximate second-order optimality under certain assumptions on accuracies about the inexact gradient and Hessian. The algorithm extends the inexact adaptive cubic regularization algorithm under true gradient in [Math. Program., 184(1-2): 35-70, 2020] to more general cases even in Euclidean settings. As an application, the algorithm is applied to solve the joint diagonalization problem on the Stiefel manifold. Numerical experiments illustrate that the algorithm performs better than the inexact trust-region algorithm in [Advances of the neural information processing systems, 31, 2018].

Inexact Adaptive Cubic Regularization Algorithms on Riemannian Manifolds and Application

TL;DR

This work develops an inexact Riemannian adaptive cubic regularization framework for large-scale, separable optimization on general manifolds by employing sub-sampled gradient and Hessian information. The SSRACR algorithm solves a cubic-regularized subproblem approximately at each iteration and updates a cubic parameter to ensure descent, with proven iteration complexity of to achieve -optimality under specified accuracy assumptions. The analysis includes deterministic and true-gradient corollaries and discusses practical solution criteria (Cauchy/Eigenpoint) in the Euclidean subproblem. As an application, the method is tested on joint diagonalization on the Stiefel manifold, where SSRACR outperforms inexact trust-region methods in terms of efficiency and CPU time, illustrating its practical impact for large-scale Riemannian optimization problems.

Abstract

The adaptive cubic regularization algorithm employing the inexact gradient and Hessian is proposed on general Riemannian manifolds, together with the iteration complexity to get an approximate second-order optimality under certain assumptions on accuracies about the inexact gradient and Hessian. The algorithm extends the inexact adaptive cubic regularization algorithm under true gradient in [Math. Program., 184(1-2): 35-70, 2020] to more general cases even in Euclidean settings. As an application, the algorithm is applied to solve the joint diagonalization problem on the Stiefel manifold. Numerical experiments illustrate that the algorithm performs better than the inexact trust-region algorithm in [Advances of the neural information processing systems, 31, 2018].
Paper Structure (4 sections, 9 theorems, 38 equations, 1 table, 1 algorithm)

This paper contains 4 sections, 9 theorems, 38 equations, 1 table, 1 algorithm.

Key Result

Lemma 2.3

Let $\delta, \delta_g, \delta_H\in(0,1)$ and let $R$ be a seconder-order retraction. Assume that the sampling is done uniformly at random to generate $S_g$ and $S_H$, and let $G$ and $H$ be defined by PGH. If the sample sizes $|S_{g}|$ and $|S_{H}|$ satisfy then the following estimates hold for any $x\in \mathcal{M}$ and any $\eta\in T_{x}\mathcal{M}$:

Theorems & Definitions (14)

  • Definition 2.1: retraction and seconder-order retraction
  • Remark 1
  • Definition 2.2: $(\varepsilon_g,\varepsilon_H)$-optimality
  • Lemma 2.3
  • Remark 2
  • Remark 3
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3
  • Lemma 3.4
  • ...and 4 more