Riemannian Adaptive Regularized Newton Methods with Hölder Continuous Hessians
Chenyu Zhang, Rujun Jiang
TL;DR
This work addresses nonconvex optimization on complete Riemannian manifolds under Hölder-smooth Hessians and Hölder-smooth retractions. It introduces a unified Riemannian adaptive regularized Newton (RARN) framework that encompasses both Riemannian trust region (RTR) and adaptive regularization (RAR) methods, and derives nonasymptotic iteration and Hessian-vector operation complexity bounds driven by the shared parameter $\alpha=\min\{\mu,\nu,\theta\}$. The main contributions are: (i) a general $2+\omega$ regularization scheme with optimal rate achieved when $\omega=\alpha$, (ii) explicit iteration and operation complexity bounds for both RTR and RAR that recover the classical $O(\varepsilon^{-3/2})$ rate when $\alpha=1$, and (iii) concrete subproblem solvers (Krylov/Lanczos and Minimal Eigenvalue Oracles) yielding sharp Hessian-vector product counts. The results offer practical guidelines for choosing regularization order and inexactness to achieve optimal convergence on manifolds, with experimental evidence showing improved performance when exploiting Hölder-smoothness information.
Abstract
This paper presents strong worst-case iteration and operation complexity guarantees for Riemannian adaptive regularized Newton methods, a unified framework encompassing both Riemannian adaptive regularization (RAR) methods and Riemannian trust region (RTR) methods. We comprehensively characterize the sources of approximation in second-order manifold optimization methods: the objective function's smoothness, retraction's smoothness, and subproblem solver's inexactness. Specifically, for a function with a $μ$-Hölder continuous Hessian, when equipped with a retraction featuring a $ν$-Hölder continuous differential and a $θ$-inexact subproblem solver, both RTR and RAR with $2+α$ regularization (where $α=\min\{μ,ν,θ\}$) locate an $(ε,ε^{α/(1+α)})$-approximate second-order stationary point within at most $O(ε^{-(2+α)/(1+α)})$ iterations and at most $\tilde{O}(ε^{-(4+3α)/(2(1+α))})$ Hessian-vector products. These complexity results are novel and sharp, and reduce to an iteration complexity of $O(ε^{-3/2})$ and an operation complexity of $\tilde{O}(ε^{-7/4})$ when $α=1$.
