Riemannian Adaptive Regularized Newton Methods with Hölder Continuous Hessians

Chenyu Zhang; Rujun Jiang

Riemannian Adaptive Regularized Newton Methods with Hölder Continuous Hessians

Chenyu Zhang, Rujun Jiang

TL;DR

This work addresses nonconvex optimization on complete Riemannian manifolds under Hölder-smooth Hessians and Hölder-smooth retractions. It introduces a unified Riemannian adaptive regularized Newton (RARN) framework that encompasses both Riemannian trust region (RTR) and adaptive regularization (RAR) methods, and derives nonasymptotic iteration and Hessian-vector operation complexity bounds driven by the shared parameter $\alpha=\min\{\mu,\nu,\theta\}$. The main contributions are: (i) a general $2+\omega$ regularization scheme with optimal rate achieved when $\omega=\alpha$, (ii) explicit iteration and operation complexity bounds for both RTR and RAR that recover the classical $O(\varepsilon^{-3/2})$ rate when $\alpha=1$, and (iii) concrete subproblem solvers (Krylov/Lanczos and Minimal Eigenvalue Oracles) yielding sharp Hessian-vector product counts. The results offer practical guidelines for choosing regularization order and inexactness to achieve optimal convergence on manifolds, with experimental evidence showing improved performance when exploiting Hölder-smoothness information.

Abstract

This paper presents strong worst-case iteration and operation complexity guarantees for Riemannian adaptive regularized Newton methods, a unified framework encompassing both Riemannian adaptive regularization (RAR) methods and Riemannian trust region (RTR) methods. We comprehensively characterize the sources of approximation in second-order manifold optimization methods: the objective function's smoothness, retraction's smoothness, and subproblem solver's inexactness. Specifically, for a function with a $μ$-Hölder continuous Hessian, when equipped with a retraction featuring a $ν$-Hölder continuous differential and a $θ$-inexact subproblem solver, both RTR and RAR with $2+α$ regularization (where $α=\min\{μ,ν,θ\}$) locate an $(ε,ε^{α/(1+α)})$-approximate second-order stationary point within at most $O(ε^{-(2+α)/(1+α)})$ iterations and at most $\tilde{O}(ε^{-(4+3α)/(2(1+α))})$ Hessian-vector products. These complexity results are novel and sharp, and reduce to an iteration complexity of $O(ε^{-3/2})$ and an operation complexity of $\tilde{O}(ε^{-7/4})$ when $α=1$.

Riemannian Adaptive Regularized Newton Methods with Hölder Continuous Hessians

TL;DR

. The main contributions are: (i) a general

regularization scheme with optimal rate achieved when

, (ii) explicit iteration and operation complexity bounds for both RTR and RAR that recover the classical

rate when

, and (iii) concrete subproblem solvers (Krylov/Lanczos and Minimal Eigenvalue Oracles) yielding sharp Hessian-vector product counts. The results offer practical guidelines for choosing regularization order and inexactness to achieve optimal convergence on manifolds, with experimental evidence showing improved performance when exploiting Hölder-smoothness information.

Abstract

-Hölder continuous Hessian, when equipped with a retraction featuring a

-Hölder continuous differential and a

-inexact subproblem solver, both RTR and RAR with

regularization (where

) locate an

-approximate second-order stationary point within at most

iterations and at most

Hessian-vector products. These complexity results are novel and sharp, and reduce to an iteration complexity of

and an operation complexity of

when

Paper Structure (30 sections, 21 theorems, 171 equations, 3 figures, 3 tables, 5 algorithms)

This paper contains 30 sections, 21 theorems, 171 equations, 3 figures, 3 tables, 5 algorithms.

Introduction
Main Results
Related Work
Preliminaries
Riemannian Adaptive Regularized Newton Methods
Iteration Complexity Analysis Framework
Riemannian Adaptive 2+alpha Regularization
Iteration Complexity of RAR
Riemannian Trust Region Methods
Iteration Complexity of RTR
Subproblem Solvers and Operation Complexity
Lanczos-Based Krylov Subspace Methods
Minimal Eigenvalue Oracle
Operation Complexity of RAR
Operation Complexity of RTR
...and 15 more sections

Key Result

Proposition 1

Suppose $R_{x}$ has a Hölder continuous differential with order $\nu\in(0,1]$ and constant $C_{R}$. For any $x\in\mathcal{M}$ and $\eta\in T_{x}\mathcal{M}$, it holds that and Moreover, if the operator norm of $\operatorname{Hess} f$ is upper bounded by $\beta_{H}$, then the discrepancy between their composition with the objective function is bounded by

Figures (3)

Figure 1: Illustration of solutions for different regularization orders. In this example, $n=2$, $c=1$, $A = Z+Z^{T}$ with $Z \sim \mathcal{N}_{3\times 3}(0,1)$, and $b = (1,0,0)$ (labeled as the base point). The colorbar indicates the function value of $f_1$ on the sphere $\mathbb{S}^{2}$. In the text label of each point, the first component is the function value of $f_1$ and the second component is the distance regularization.
Figure 2: Comparison of $\mu$-aware and $\mu$-agnostic RAR.
Figure 3: Comparison of $\mu$-aware and $\mu$-agnostic RTR.

Theorems & Definitions (44)

Definition 1: Approximate second-order stationary point
Definition 2: Hölder continuity of objective's Hessian
Definition 3: Hölder continuity of retraction's differential
Proposition 1: Retraction properties
Proposition 2: Termination criteria
Lemma 1: Decomposition of total number of iterations
proof
Remark 1
Lemma 2: Number of successful iterations
proof
...and 34 more

Riemannian Adaptive Regularized Newton Methods with Hölder Continuous Hessians

TL;DR

Abstract

Riemannian Adaptive Regularized Newton Methods with Hölder Continuous Hessians

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (44)