Riemannian coordinate descent algorithms on matrix manifolds
Andi Han, Pratik Jawanpuria, Bamdev Mishra
TL;DR
This work develops a general Riemannian coordinate-descent framework for optimization on matrix manifolds, enabling efficient updates that touch only a few tangent-space coordinates while maintaining feasibility via manifold retractions. It constructs manifold-specific tangent-basis parameterizations for Stiefel, Grassmann, hyperbolic, symplectic, and SPSD/doubly stochastic manifolds, and introduces two algorithms, RCD and a cheaper RCDlin, with convergence guarantees under randomized and cyclic coordinate selections. The framework is demonstrated on problems including Orthogonal Procrustes, PCA, orthogonal network distillation, Lorentz embeddings, and nearest-matrix tasks, showing favorable per-iteration costs and competitive convergence compared to full-gradient methods. Overall, the approach broadens scalable Riemannian optimization by delivering robust, low-cost coordinate updates across diverse geometries with practical impact for large-scale manifold-constrained learning and completion problems.
Abstract
Many machine learning applications are naturally formulated as optimization problems on Riemannian manifolds. The main idea behind Riemannian optimization is to maintain the feasibility of the variables while moving along a descent direction on the manifold. This results in updating all the variables at every iteration. In this work, we provide a general framework for developing computationally efficient coordinate descent (CD) algorithms on matrix manifolds that allows updating only a few variables at every iteration while adhering to the manifold constraint. In particular, we propose CD algorithms for various manifolds such as Stiefel, Grassmann, (generalized) hyperbolic, symplectic, and symmetric positive (semi)definite. While the cost per iteration of the proposed CD algorithms is low, we further develop a more efficient variant via a first-order approximation of the objective function. We analyze their convergence and complexity, and empirically illustrate their efficacy in several applications.
