Table of Contents
Fetching ...

Riemannian Adaptive Regularized Newton Methods with Hölder Continuous Hessians

Chenyu Zhang, Rujun Jiang

TL;DR

This work addresses nonconvex optimization on complete Riemannian manifolds under Hölder-smooth Hessians and Hölder-smooth retractions. It introduces a unified Riemannian adaptive regularized Newton (RARN) framework that encompasses both Riemannian trust region (RTR) and adaptive regularization (RAR) methods, and derives nonasymptotic iteration and Hessian-vector operation complexity bounds driven by the shared parameter $\alpha=\min\{\mu,\nu,\theta\}$. The main contributions are: (i) a general $2+\omega$ regularization scheme with optimal rate achieved when $\omega=\alpha$, (ii) explicit iteration and operation complexity bounds for both RTR and RAR that recover the classical $O(\varepsilon^{-3/2})$ rate when $\alpha=1$, and (iii) concrete subproblem solvers (Krylov/Lanczos and Minimal Eigenvalue Oracles) yielding sharp Hessian-vector product counts. The results offer practical guidelines for choosing regularization order and inexactness to achieve optimal convergence on manifolds, with experimental evidence showing improved performance when exploiting Hölder-smoothness information.

Abstract

This paper presents strong worst-case iteration and operation complexity guarantees for Riemannian adaptive regularized Newton methods, a unified framework encompassing both Riemannian adaptive regularization (RAR) methods and Riemannian trust region (RTR) methods. We comprehensively characterize the sources of approximation in second-order manifold optimization methods: the objective function's smoothness, retraction's smoothness, and subproblem solver's inexactness. Specifically, for a function with a $μ$-Hölder continuous Hessian, when equipped with a retraction featuring a $ν$-Hölder continuous differential and a $θ$-inexact subproblem solver, both RTR and RAR with $2+α$ regularization (where $α=\min\{μ,ν,θ\}$) locate an $(ε,ε^{α/(1+α)})$-approximate second-order stationary point within at most $O(ε^{-(2+α)/(1+α)})$ iterations and at most $\tilde{O}(ε^{-(4+3α)/(2(1+α))})$ Hessian-vector products. These complexity results are novel and sharp, and reduce to an iteration complexity of $O(ε^{-3/2})$ and an operation complexity of $\tilde{O}(ε^{-7/4})$ when $α=1$.

Riemannian Adaptive Regularized Newton Methods with Hölder Continuous Hessians

TL;DR

This work addresses nonconvex optimization on complete Riemannian manifolds under Hölder-smooth Hessians and Hölder-smooth retractions. It introduces a unified Riemannian adaptive regularized Newton (RARN) framework that encompasses both Riemannian trust region (RTR) and adaptive regularization (RAR) methods, and derives nonasymptotic iteration and Hessian-vector operation complexity bounds driven by the shared parameter . The main contributions are: (i) a general regularization scheme with optimal rate achieved when , (ii) explicit iteration and operation complexity bounds for both RTR and RAR that recover the classical rate when , and (iii) concrete subproblem solvers (Krylov/Lanczos and Minimal Eigenvalue Oracles) yielding sharp Hessian-vector product counts. The results offer practical guidelines for choosing regularization order and inexactness to achieve optimal convergence on manifolds, with experimental evidence showing improved performance when exploiting Hölder-smoothness information.

Abstract

This paper presents strong worst-case iteration and operation complexity guarantees for Riemannian adaptive regularized Newton methods, a unified framework encompassing both Riemannian adaptive regularization (RAR) methods and Riemannian trust region (RTR) methods. We comprehensively characterize the sources of approximation in second-order manifold optimization methods: the objective function's smoothness, retraction's smoothness, and subproblem solver's inexactness. Specifically, for a function with a -Hölder continuous Hessian, when equipped with a retraction featuring a -Hölder continuous differential and a -inexact subproblem solver, both RTR and RAR with regularization (where ) locate an -approximate second-order stationary point within at most iterations and at most Hessian-vector products. These complexity results are novel and sharp, and reduce to an iteration complexity of and an operation complexity of when .
Paper Structure (30 sections, 21 theorems, 171 equations, 3 figures, 3 tables, 5 algorithms)

This paper contains 30 sections, 21 theorems, 171 equations, 3 figures, 3 tables, 5 algorithms.

Key Result

Proposition 1

Suppose $R_{x}$ has a Hölder continuous differential with order $\nu\in(0,1]$ and constant $C_{R}$. For any $x\in\mathcal{M}$ and $\eta\in T_{x}\mathcal{M}$, it holds that and Moreover, if the operator norm of $\operatorname{Hess} f$ is upper bounded by $\beta_{H}$, then the discrepancy between their composition with the objective function is bounded by

Figures (3)

  • Figure 1: Illustration of solutions for different regularization orders. In this example, $n=2$, $c=1$, $A = Z+Z^{T}$ with $Z \sim \mathcal{N}_{3\times 3}(0,1)$, and $b = (1,0,0)$ (labeled as the base point). The colorbar indicates the function value of $f_1$ on the sphere $\mathbb{S}^{2}$. In the text label of each point, the first component is the function value of $f_1$ and the second component is the distance regularization.
  • Figure 2: Comparison of $\mu$-aware and $\mu$-agnostic RAR.
  • Figure 3: Comparison of $\mu$-aware and $\mu$-agnostic RTR.

Theorems & Definitions (44)

  • Definition 1: Approximate second-order stationary point
  • Definition 2: Hölder continuity of objective's Hessian
  • Definition 3: Hölder continuity of retraction's differential
  • Proposition 1: Retraction properties
  • Proposition 2: Termination criteria
  • Lemma 1: Decomposition of total number of iterations
  • proof
  • Remark 1
  • Lemma 2: Number of successful iterations
  • proof
  • ...and 34 more