A Variance-Reduced Stochastic Gradient Tracking Algorithm for Decentralized Optimization with Orthogonality Constraints

Lei Wang; Xin Liu

A Variance-Reduced Stochastic Gradient Tracking Algorithm for Decentralized Optimization with Orthogonality Constraints

Lei Wang, Xin Liu

TL;DR

The paper tackles decentralized optimization on the Stiefel manifold with orthogonality constraints, proposing VRSGT to simultaneously reduce sampling and communication costs. It introduces augmented Lagrangian estimation and gradient approximation to handle constraints and compute descent directions efficiently, achieving an $O(1/k)$ convergence rate to a stationary point. The authors provide thorough convergence analysis under mild local smoothness assumptions and demonstrate strong empirical performance on decentralized PCA and DPCP tasks, including autonomous driving scenarios. The work offers practical, scalable tools for distributed learning with nonconvex orthogonality constraints and highlights potential for broader applicability in deep learning and real-time sensing systems.

Abstract

Decentralized optimization with orthogonality constraints is found widely in scientific computing and data science. Since the orthogonality constraints are nonconvex, it is quite challenging to design efficient algorithms. Existing approaches leverage the geometric tools from Riemannian optimization to solve this problem at the cost of high sample and communication complexities. To relieve this difficulty, based on two novel techniques that can waive the orthogonality constraints, we propose a variance-reduced stochastic gradient tracking (VRSGT) algorithm with the convergence rate of $O(1 / k)$ to a stationary point. To the best of our knowledge, VRSGT is the first algorithm for decentralized optimization with orthogonality constraints that reduces both sampling and communication complexities simultaneously. In the numerical experiments, VRSGT has a promising performance in a real-world autonomous driving application.

A Variance-Reduced Stochastic Gradient Tracking Algorithm for Decentralized Optimization with Orthogonality Constraints

TL;DR

convergence rate to a stationary point. The authors provide thorough convergence analysis under mild local smoothness assumptions and demonstrate strong empirical performance on decentralized PCA and DPCP tasks, including autonomous driving scenarios. The work offers practical, scalable tools for distributed learning with nonconvex orthogonality constraints and highlights potential for broader applicability in deep learning and real-time sensing systems.

Abstract

to a stationary point. To the best of our knowledge, VRSGT is the first algorithm for decentralized optimization with orthogonality constraints that reduces both sampling and communication complexities simultaneously. In the numerical experiments, VRSGT has a promising performance in a real-world autonomous driving application.

Paper Structure (30 sections, 8 theorems, 68 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 30 sections, 8 theorems, 68 equations, 3 figures, 2 tables, 1 algorithm.

Introduction
Decentralized Formulation
Existing Works
Riemannian SVRG methods.
Decentralized algorithms in the Euclidean space.
Decentralized Riemannian gradient descent methods.
Our Contributions
Notations
Algorithm Design
Augmented Lagrangian Estimation
Gradient Approximation
Algorithmic Development
Step 1: $\mathbf{X}$-update.
Step 2: $\mathbf{S}$-update.
Step 3: $\mathbf{D}$-update.
...and 15 more sections

Key Result

Lemma 2.1

Let $\mathcal{R} := \{X \in \mathbb{R}^{n \times p} \mid \|X^{\top} X - I_p\|_{\mathrm{F}} \leq 1 / 6\}$ be a bounded region and $M := \sup \{ \| \nabla f (X) \|_{\mathrm{F}} \mid X \in \mathcal{R}\}$ be a positive constant. Then if $\beta \geq (6 + 21 M) / 5$, we have for any $X \in \mathcal{R}$.

Figures (3)

Figure 1: Comparison of VRSGT for different values of $\beta$.
Figure 2: Comparison between VRSGT and DRSGD in solving the decentralized PCA problem.
Figure 3: Recovery results of four frames from the KITTI dataset with inliers in blue and outliers in red. Both inliers and outliers are detected by using a threshold on the distance to the hyperplane recovered by each tested algorithm. The results are represented by projecting 3D point clouds onto the image.

Theorems & Definitions (17)

Definition 1.1: Wang2021multipliers
Lemma 2.1
Definition 2.2
Theorem 3.1
proof : Proof of Lemma \ref{['le:expen']}
Lemma B.1
proof
Lemma B.2
proof
Lemma B.3
...and 7 more

A Variance-Reduced Stochastic Gradient Tracking Algorithm for Decentralized Optimization with Orthogonality Constraints

TL;DR

Abstract

A Variance-Reduced Stochastic Gradient Tracking Algorithm for Decentralized Optimization with Orthogonality Constraints

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (17)