Retraction-Free Decentralized Non-convex Optimization with Orthogonal Constraints

Youbang Sun; Shixiang Chen; Alfredo Garcia; Shahin Shahrampour

Retraction-Free Decentralized Non-convex Optimization with Orthogonal Constraints

Youbang Sun, Shixiang Chen, Alfredo Garcia, Shahin Shahrampour

TL;DR

This work tackles decentralized non-convex optimization with orthogonal constraints on the Stiefel manifold, where traditional projection or retraction steps are computationally expensive. It introduces the Decentralized Retraction-Free Gradient Tracking (DRFGT) algorithm, a fully decentralized, infeasible-but-convergent method that uses a landing-field update to drive iterates toward feasibility without retractions. The authors establish an ergodic $\mathcal{O}(1/K)$ convergence rate and, under a local Riemannian PŁ condition, a local linear convergence rate, along with a safe-step-size analysis that ensures iterates stay within a neighborhood of the manifold. Numerical experiments on decentralized PCA with synthetic and real data corroborate the theory, showing DRFGT achieves competitive accuracy with substantially reduced computational overhead and favorable CPU-time performance compared to retraction-based methods.

Abstract

In this paper, we investigate decentralized non-convex optimization with orthogonal constraints. Conventional algorithms for this setting require either manifold retractions or other types of projection to ensure feasibility, both of which involve costly linear algebra operations (e.g., SVD or matrix inversion). On the other hand, infeasible methods are able to provide similar performance with higher computational efficiency. Inspired by this, we propose the first decentralized version of the retraction-free landing algorithm, called \textbf{D}ecentralized \textbf{R}etraction-\textbf{F}ree \textbf{G}radient \textbf{T}racking (DRFGT). We theoretically prove that DRFGT enjoys the ergodic convergence rate of $\mathcal{O}(1/K)$, matching the convergence rate of centralized, retraction-based methods. We further establish that under a local Riemannian PŁ condition, DRFGT achieves a much faster linear convergence rate. Numerical experiments demonstrate that DRFGT performs on par with the state-of-the-art retraction-based methods with substantially reduced computational overhead.

Retraction-Free Decentralized Non-convex Optimization with Orthogonal Constraints

TL;DR

convergence rate and, under a local Riemannian PŁ condition, a local linear convergence rate, along with a safe-step-size analysis that ensures iterates stay within a neighborhood of the manifold. Numerical experiments on decentralized PCA with synthetic and real data corroborate the theory, showing DRFGT achieves competitive accuracy with substantially reduced computational overhead and favorable CPU-time performance compared to retraction-based methods.

Abstract

, matching the convergence rate of centralized, retraction-based methods. We further establish that under a local Riemannian PŁ condition, DRFGT achieves a much faster linear convergence rate. Numerical experiments demonstrate that DRFGT performs on par with the state-of-the-art retraction-based methods with substantially reduced computational overhead.

Paper Structure (23 sections, 22 theorems, 123 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 22 theorems, 123 equations, 2 figures, 1 table, 1 algorithm.

Introduction
Contributions
Related Literature
Optimization on Manifolds
Decentralized Extensions of Riemannian Optimization
Preliminaries
Notations
Optimization on the Stiefel Manifold
Technical Assumptions
A Smooth Merit Function and the Optimality Condition
Main Results
Decentralized Retraction-Free Gradient Tracking
Linear System Analysis
Global Convergence of DRFGT
Local Linear Convergence of DRFGT Under Local PŁ Condition
...and 8 more sections

Key Result

Proposition 3.5

The merit function $\mathcal{L}(x)$ satisfies the following properties.

Figures (2)

Figure 1: Convergence of DRFGT (our work) compared to DRGTA in chen2021decentralized.
Figure 2: Experiment with MNIST data.

Theorems & Definitions (29)

Definition 3.1: Safety Region ablin2023infeasible
Proposition 3.5
Proposition 4.1: Safe Step Size in Networks
Lemma 4.2
Lemma 4.3
Theorem 4.4: Stability Conditions
Lemma 4.5
Corollary 4.6
Theorem 4.7: Global Convergence
Lemma 4.8: PŁ-QG
...and 19 more

Retraction-Free Decentralized Non-convex Optimization with Orthogonal Constraints

TL;DR

Abstract

Retraction-Free Decentralized Non-convex Optimization with Orthogonal Constraints

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (29)