Table of Contents
Fetching ...

Retraction-Free Decentralized Non-convex Optimization with Orthogonal Constraints

Youbang Sun, Shixiang Chen, Alfredo Garcia, Shahin Shahrampour

TL;DR

This work tackles decentralized non-convex optimization with orthogonal constraints on the Stiefel manifold, where traditional projection or retraction steps are computationally expensive. It introduces the Decentralized Retraction-Free Gradient Tracking (DRFGT) algorithm, a fully decentralized, infeasible-but-convergent method that uses a landing-field update to drive iterates toward feasibility without retractions. The authors establish an ergodic $\mathcal{O}(1/K)$ convergence rate and, under a local Riemannian PŁ condition, a local linear convergence rate, along with a safe-step-size analysis that ensures iterates stay within a neighborhood of the manifold. Numerical experiments on decentralized PCA with synthetic and real data corroborate the theory, showing DRFGT achieves competitive accuracy with substantially reduced computational overhead and favorable CPU-time performance compared to retraction-based methods.

Abstract

In this paper, we investigate decentralized non-convex optimization with orthogonal constraints. Conventional algorithms for this setting require either manifold retractions or other types of projection to ensure feasibility, both of which involve costly linear algebra operations (e.g., SVD or matrix inversion). On the other hand, infeasible methods are able to provide similar performance with higher computational efficiency. Inspired by this, we propose the first decentralized version of the retraction-free landing algorithm, called \textbf{D}ecentralized \textbf{R}etraction-\textbf{F}ree \textbf{G}radient \textbf{T}racking (DRFGT). We theoretically prove that DRFGT enjoys the ergodic convergence rate of $\mathcal{O}(1/K)$, matching the convergence rate of centralized, retraction-based methods. We further establish that under a local Riemannian PŁ condition, DRFGT achieves a much faster linear convergence rate. Numerical experiments demonstrate that DRFGT performs on par with the state-of-the-art retraction-based methods with substantially reduced computational overhead.

Retraction-Free Decentralized Non-convex Optimization with Orthogonal Constraints

TL;DR

This work tackles decentralized non-convex optimization with orthogonal constraints on the Stiefel manifold, where traditional projection or retraction steps are computationally expensive. It introduces the Decentralized Retraction-Free Gradient Tracking (DRFGT) algorithm, a fully decentralized, infeasible-but-convergent method that uses a landing-field update to drive iterates toward feasibility without retractions. The authors establish an ergodic convergence rate and, under a local Riemannian PŁ condition, a local linear convergence rate, along with a safe-step-size analysis that ensures iterates stay within a neighborhood of the manifold. Numerical experiments on decentralized PCA with synthetic and real data corroborate the theory, showing DRFGT achieves competitive accuracy with substantially reduced computational overhead and favorable CPU-time performance compared to retraction-based methods.

Abstract

In this paper, we investigate decentralized non-convex optimization with orthogonal constraints. Conventional algorithms for this setting require either manifold retractions or other types of projection to ensure feasibility, both of which involve costly linear algebra operations (e.g., SVD or matrix inversion). On the other hand, infeasible methods are able to provide similar performance with higher computational efficiency. Inspired by this, we propose the first decentralized version of the retraction-free landing algorithm, called \textbf{D}ecentralized \textbf{R}etraction-\textbf{F}ree \textbf{G}radient \textbf{T}racking (DRFGT). We theoretically prove that DRFGT enjoys the ergodic convergence rate of , matching the convergence rate of centralized, retraction-based methods. We further establish that under a local Riemannian PŁ condition, DRFGT achieves a much faster linear convergence rate. Numerical experiments demonstrate that DRFGT performs on par with the state-of-the-art retraction-based methods with substantially reduced computational overhead.
Paper Structure (23 sections, 22 theorems, 123 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 22 theorems, 123 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Proposition 3.5

The merit function $\mathcal{L}(x)$ satisfies the following properties.

Figures (2)

  • Figure 1: Convergence of DRFGT (our work) compared to DRGTA in chen2021decentralized.
  • Figure 2: Experiment with MNIST data.

Theorems & Definitions (29)

  • Definition 3.1: Safety Region ablin2023infeasible
  • Proposition 3.5
  • Proposition 4.1: Safe Step Size in Networks
  • Lemma 4.2
  • Lemma 4.3
  • Theorem 4.4: Stability Conditions
  • Lemma 4.5
  • Corollary 4.6
  • Theorem 4.7: Global Convergence
  • Lemma 4.8: PŁ-QG
  • ...and 19 more