Riemannian Momentum Tracking: Distributed Optimization with Momentum on Compact Submanifolds
Jun Chen, Tianyi Zhu, Haishan Ye, Lina Liu, Guang Dai, Yong Liu, Yunliang Jiang, Ivor W. Tsang
TL;DR
The paper tackles decentralized optimization of a smooth finite-sum objective constrained to a compact submanifold across a network. It introduces RMTracking, a momentum-augmented distributed Riemannian optimizer that reorders momentum updates relative to gradient tracking to reduce bias and improve convergence. The authors prove an $O((1- ext{beta})/K)$ rate for the Riemannian gradient average with a fixed step-size and show stationary-point convergence for small step-sizes, achieving a $rac{1}{1- ext{beta}}$ speedup over existing methods, corroborated by eigenvalue problem experiments. This advances scalable, multi-agent optimization on manifolds by incorporating momentum in a principled, convergent manner with practical performance gains.
Abstract
Gradient descent with momentum has been widely applied in various signal processing and machine learning tasks, demonstrating a notable empirical advantage over standard gradient descent. However, momentum-based distributed Riemannian algorithms have been only scarcely explored. In this paper, we propose Riemannian Momentum Tracking (RMTracking), a decentralized optimization algorithm with momentum over a compact submanifold. Given the non-convex nature of compact submanifolds, the objective function, composed of a finite sum of smooth (possibly non-convex) local functions, is minimized across agents in an undirected and connected network graph. With a constant step-size, we establish an $\mathcal{O}(\frac{1-β}{K})$ convergence rate of the Riemannian gradient average for any momentum weight $β\in [0,1)$. Especially, RMTracking can achieve a convergence rate of $\mathcal{O}(\frac{1-β}{K})$ to a stationary point when the step-size is sufficiently small. To best of our knowledge, RMTracking is the first decentralized algorithm to achieve exact convergence that is $\frac{1}{1-β}$ times faster than other related algorithms. Finally, we verify these theoretical claims through numerical experiments on eigenvalue problems.
