An Adaptive Algorithm for Bilevel Optimization on Riemannian Manifolds
Xu Shi, Rufeng Xiao, Rujun Jiang
TL;DR
We address Riemannian bilevel optimization (RBO) where step sizes typically require problem-specific curvature and Lipschitz constants. We propose AdaRHD, a fully adaptive hypergradient-descent method that updates step sizes via the inverse cumulative gradient norm, eliminating prior parameter knowledge. Theoretical results show an $\mathcal{O}(1/\epsilon)$ iteration complexity to obtain an $\epsilon$-stationary point, with gradient and Hessian-vector complexities mirroring non-adaptive methods; this extends to retraction mappings without sacrificing the rate. Empirical results on simple and robust RBO problems demonstrate competitive performance and enhanced robustness, validating AdaRHD as a practical, parameter-free solver for Riemannian bilevel problems. Future work includes single-loop adaptive schemes and stochastic extensions to further close remaining complexity gaps.
Abstract
Existing methods for solving Riemannian bilevel optimization (RBO) problems require prior knowledge of the problem's first- and second-order information and curvature parameter of the Riemannian manifold to determine step sizes, which poses practical limitations when these parameters are unknown or computationally infeasible to obtain. In this paper, we introduce the Adaptive Riemannian Hypergradient Descent (AdaRHD) algorithm for solving RBO problems. To our knowledge, AdaRHD is the first method to incorporate a fully adaptive step size strategy that eliminates the need for problem-specific parameters in RBO. We prove that AdaRHD achieves an $\mathcal{O}(1/ε)$ iteration complexity for finding an $ε$-stationary point, thus matching the complexity of existing non-adaptive methods. Furthermore, we demonstrate that substituting exponential mappings with retraction mappings maintains the same complexity bound. Experiments demonstrate that AdaRHD achieves comparable performance to existing non-adaptive approaches while exhibiting greater robustness.
