Table of Contents
Fetching ...

Riemannian Momentum Tracking: Distributed Optimization with Momentum on Compact Submanifolds

Jun Chen, Tianyi Zhu, Haishan Ye, Lina Liu, Guang Dai, Yong Liu, Yunliang Jiang, Ivor W. Tsang

TL;DR

The paper tackles decentralized optimization of a smooth finite-sum objective constrained to a compact submanifold across a network. It introduces RMTracking, a momentum-augmented distributed Riemannian optimizer that reorders momentum updates relative to gradient tracking to reduce bias and improve convergence. The authors prove an $O((1- ext{beta})/K)$ rate for the Riemannian gradient average with a fixed step-size and show stationary-point convergence for small step-sizes, achieving a $ rac{1}{1- ext{beta}}$ speedup over existing methods, corroborated by eigenvalue problem experiments. This advances scalable, multi-agent optimization on manifolds by incorporating momentum in a principled, convergent manner with practical performance gains.

Abstract

Gradient descent with momentum has been widely applied in various signal processing and machine learning tasks, demonstrating a notable empirical advantage over standard gradient descent. However, momentum-based distributed Riemannian algorithms have been only scarcely explored. In this paper, we propose Riemannian Momentum Tracking (RMTracking), a decentralized optimization algorithm with momentum over a compact submanifold. Given the non-convex nature of compact submanifolds, the objective function, composed of a finite sum of smooth (possibly non-convex) local functions, is minimized across agents in an undirected and connected network graph. With a constant step-size, we establish an $\mathcal{O}(\frac{1-β}{K})$ convergence rate of the Riemannian gradient average for any momentum weight $β\in [0,1)$. Especially, RMTracking can achieve a convergence rate of $\mathcal{O}(\frac{1-β}{K})$ to a stationary point when the step-size is sufficiently small. To best of our knowledge, RMTracking is the first decentralized algorithm to achieve exact convergence that is $\frac{1}{1-β}$ times faster than other related algorithms. Finally, we verify these theoretical claims through numerical experiments on eigenvalue problems.

Riemannian Momentum Tracking: Distributed Optimization with Momentum on Compact Submanifolds

TL;DR

The paper tackles decentralized optimization of a smooth finite-sum objective constrained to a compact submanifold across a network. It introduces RMTracking, a momentum-augmented distributed Riemannian optimizer that reorders momentum updates relative to gradient tracking to reduce bias and improve convergence. The authors prove an rate for the Riemannian gradient average with a fixed step-size and show stationary-point convergence for small step-sizes, achieving a speedup over existing methods, corroborated by eigenvalue problem experiments. This advances scalable, multi-agent optimization on manifolds by incorporating momentum in a principled, convergent manner with practical performance gains.

Abstract

Gradient descent with momentum has been widely applied in various signal processing and machine learning tasks, demonstrating a notable empirical advantage over standard gradient descent. However, momentum-based distributed Riemannian algorithms have been only scarcely explored. In this paper, we propose Riemannian Momentum Tracking (RMTracking), a decentralized optimization algorithm with momentum over a compact submanifold. Given the non-convex nature of compact submanifolds, the objective function, composed of a finite sum of smooth (possibly non-convex) local functions, is minimized across agents in an undirected and connected network graph. With a constant step-size, we establish an convergence rate of the Riemannian gradient average for any momentum weight . Especially, RMTracking can achieve a convergence rate of to a stationary point when the step-size is sufficiently small. To best of our knowledge, RMTracking is the first decentralized algorithm to achieve exact convergence that is times faster than other related algorithms. Finally, we verify these theoretical claims through numerical experiments on eigenvalue problems.
Paper Structure (20 sections, 82 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 20 sections, 82 equations, 2 figures, 1 table, 1 algorithm.

Figures (2)

  • Figure 1: Numerical results on synthetic data with different momentum weights and single-step consensus, eigengap $\Delta = 0.8$, Graph: Ring, $n=16$, $\hat{\alpha}=0.02$.
  • Figure 2: Numerical results on synthetic data with different step-sizes and single-step consensus, eigengap $\Delta = 0.8$, Graph: Ring, $n=16$.

Theorems & Definitions (11)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • ...and 1 more