Table of Contents
Fetching ...

Riemannian coordinate descent algorithms on matrix manifolds

Andi Han, Pratik Jawanpuria, Bamdev Mishra

TL;DR

This work develops a general Riemannian coordinate-descent framework for optimization on matrix manifolds, enabling efficient updates that touch only a few tangent-space coordinates while maintaining feasibility via manifold retractions. It constructs manifold-specific tangent-basis parameterizations for Stiefel, Grassmann, hyperbolic, symplectic, and SPSD/doubly stochastic manifolds, and introduces two algorithms, RCD and a cheaper RCDlin, with convergence guarantees under randomized and cyclic coordinate selections. The framework is demonstrated on problems including Orthogonal Procrustes, PCA, orthogonal network distillation, Lorentz embeddings, and nearest-matrix tasks, showing favorable per-iteration costs and competitive convergence compared to full-gradient methods. Overall, the approach broadens scalable Riemannian optimization by delivering robust, low-cost coordinate updates across diverse geometries with practical impact for large-scale manifold-constrained learning and completion problems.

Abstract

Many machine learning applications are naturally formulated as optimization problems on Riemannian manifolds. The main idea behind Riemannian optimization is to maintain the feasibility of the variables while moving along a descent direction on the manifold. This results in updating all the variables at every iteration. In this work, we provide a general framework for developing computationally efficient coordinate descent (CD) algorithms on matrix manifolds that allows updating only a few variables at every iteration while adhering to the manifold constraint. In particular, we propose CD algorithms for various manifolds such as Stiefel, Grassmann, (generalized) hyperbolic, symplectic, and symmetric positive (semi)definite. While the cost per iteration of the proposed CD algorithms is low, we further develop a more efficient variant via a first-order approximation of the objective function. We analyze their convergence and complexity, and empirically illustrate their efficacy in several applications.

Riemannian coordinate descent algorithms on matrix manifolds

TL;DR

This work develops a general Riemannian coordinate-descent framework for optimization on matrix manifolds, enabling efficient updates that touch only a few tangent-space coordinates while maintaining feasibility via manifold retractions. It constructs manifold-specific tangent-basis parameterizations for Stiefel, Grassmann, hyperbolic, symplectic, and SPSD/doubly stochastic manifolds, and introduces two algorithms, RCD and a cheaper RCDlin, with convergence guarantees under randomized and cyclic coordinate selections. The framework is demonstrated on problems including Orthogonal Procrustes, PCA, orthogonal network distillation, Lorentz embeddings, and nearest-matrix tasks, showing favorable per-iteration costs and competitive convergence compared to full-gradient methods. Overall, the approach broadens scalable Riemannian optimization by delivering robust, low-cost coordinate updates across diverse geometries with practical impact for large-scale manifold-constrained learning and completion problems.

Abstract

Many machine learning applications are naturally formulated as optimization problems on Riemannian manifolds. The main idea behind Riemannian optimization is to maintain the feasibility of the variables while moving along a descent direction on the manifold. This results in updating all the variables at every iteration. In this work, we provide a general framework for developing computationally efficient coordinate descent (CD) algorithms on matrix manifolds that allows updating only a few variables at every iteration while adhering to the manifold constraint. In particular, we propose CD algorithms for various manifolds such as Stiefel, Grassmann, (generalized) hyperbolic, symplectic, and symmetric positive (semi)definite. While the cost per iteration of the proposed CD algorithms is low, we further develop a more efficient variant via a first-order approximation of the objective function. We analyze their convergence and complexity, and empirically illustrate their efficacy in several applications.
Paper Structure (48 sections, 16 theorems, 41 equations, 6 figures, 1 table, 1 algorithm)

This paper contains 48 sections, 16 theorems, 41 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Proposition 3.2

Consider a function $f : {\rm Gr}(n,p) \rightarrow {\mathbb R}$. Let the coordinate descent update at $[X]$ be given by ${\mathrm{Retr}}_{X}(-\eta \, \theta H_{ij} X) \coloneqq G_{ij}(- \eta \theta) X$ for $1\leq i < j \leq n$, where $\theta = \langle \nabla f(X), H_{ij} X \rangle$ and for some fixe

Figures (6)

  • Figure 1: The Procrustes problem with varying $p$: (a) $p=150$ and (b) $p=50$. (Top row) Comparing various algorithms in terms of flop counts. (Bottom row) Comparing various algorithms in terms of runtime. We observe that our RCD algorithm obtains better flop counts than the baselines in flop counts and is competitive in terms of runtime.
  • Figure 2: (a) & (b): Experiments on the PCA problem with $n = 200, p = 50$. In (a), we observe that our algorithm RCDlin achieves the fastest convergence due to low per-iteration cost. In (b), we compare various strategies for basis selection: cyclic selection (-c) and uniformly random selection (-r) of basis for TSD, RCD, and RCDlin, and selection without replacement (-nr) for RCDlin. We observe that cyclic and selection without replacement strategies are better than random selection. (c) & (d): Experiments on the Procrustes problem with $n = 200, p = 150$. In (c), we again observe that cyclic selection performs better than random selection. In (d), RCD performs competitively against the infeasible methods.
  • Figure 3: Experiments on the distillation problem. We observe that the proposed RCD algorithm performs better than the baselines both in terms of flop counts and runtime.
  • Figure 4: Experiments on the nearest matrix problem. We notice the utility of the block-update variants of our RCD and RCDlin algorithms in obtaining faster convergence.
  • Figure 5: Experiments on learning Lorentz (hyperbolic) embeddings. The performance of our RCDlin algorithms (with cyclic and time-cyclic basis selection) is competitive to RGD.
  • ...and 1 more figures

Theorems & Definitions (41)

  • Remark 3.1
  • Proposition 3.2
  • Proposition 3.3
  • Proposition 3.4
  • Lemma 3.5
  • Proposition 3.6
  • Proposition 3.7
  • Remark 3.8: Block coordinate updates
  • Proposition 3.9
  • Remark 3.10: CD on multinomial manifold
  • ...and 31 more