Table of Contents
Fetching ...

A Variance-Reduced Stochastic Gradient Tracking Algorithm for Decentralized Optimization with Orthogonality Constraints

Lei Wang, Xin Liu

TL;DR

The paper tackles decentralized optimization on the Stiefel manifold with orthogonality constraints, proposing VRSGT to simultaneously reduce sampling and communication costs. It introduces augmented Lagrangian estimation and gradient approximation to handle constraints and compute descent directions efficiently, achieving an $O(1/k)$ convergence rate to a stationary point. The authors provide thorough convergence analysis under mild local smoothness assumptions and demonstrate strong empirical performance on decentralized PCA and DPCP tasks, including autonomous driving scenarios. The work offers practical, scalable tools for distributed learning with nonconvex orthogonality constraints and highlights potential for broader applicability in deep learning and real-time sensing systems.

Abstract

Decentralized optimization with orthogonality constraints is found widely in scientific computing and data science. Since the orthogonality constraints are nonconvex, it is quite challenging to design efficient algorithms. Existing approaches leverage the geometric tools from Riemannian optimization to solve this problem at the cost of high sample and communication complexities. To relieve this difficulty, based on two novel techniques that can waive the orthogonality constraints, we propose a variance-reduced stochastic gradient tracking (VRSGT) algorithm with the convergence rate of $O(1 / k)$ to a stationary point. To the best of our knowledge, VRSGT is the first algorithm for decentralized optimization with orthogonality constraints that reduces both sampling and communication complexities simultaneously. In the numerical experiments, VRSGT has a promising performance in a real-world autonomous driving application.

A Variance-Reduced Stochastic Gradient Tracking Algorithm for Decentralized Optimization with Orthogonality Constraints

TL;DR

The paper tackles decentralized optimization on the Stiefel manifold with orthogonality constraints, proposing VRSGT to simultaneously reduce sampling and communication costs. It introduces augmented Lagrangian estimation and gradient approximation to handle constraints and compute descent directions efficiently, achieving an convergence rate to a stationary point. The authors provide thorough convergence analysis under mild local smoothness assumptions and demonstrate strong empirical performance on decentralized PCA and DPCP tasks, including autonomous driving scenarios. The work offers practical, scalable tools for distributed learning with nonconvex orthogonality constraints and highlights potential for broader applicability in deep learning and real-time sensing systems.

Abstract

Decentralized optimization with orthogonality constraints is found widely in scientific computing and data science. Since the orthogonality constraints are nonconvex, it is quite challenging to design efficient algorithms. Existing approaches leverage the geometric tools from Riemannian optimization to solve this problem at the cost of high sample and communication complexities. To relieve this difficulty, based on two novel techniques that can waive the orthogonality constraints, we propose a variance-reduced stochastic gradient tracking (VRSGT) algorithm with the convergence rate of to a stationary point. To the best of our knowledge, VRSGT is the first algorithm for decentralized optimization with orthogonality constraints that reduces both sampling and communication complexities simultaneously. In the numerical experiments, VRSGT has a promising performance in a real-world autonomous driving application.
Paper Structure (30 sections, 8 theorems, 68 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 30 sections, 8 theorems, 68 equations, 3 figures, 2 tables, 1 algorithm.

Key Result

Lemma 2.1

Let $\mathcal{R} := \{X \in \mathbb{R}^{n \times p} \mid \|X^{\top} X - I_p\|_{\mathrm{F}} \leq 1 / 6\}$ be a bounded region and $M := \sup \{ \| \nabla f (X) \|_{\mathrm{F}} \mid X \in \mathcal{R}\}$ be a positive constant. Then if $\beta \geq (6 + 21 M) / 5$, we have for any $X \in \mathcal{R}$.

Figures (3)

  • Figure 1: Comparison of VRSGT for different values of $\beta$.
  • Figure 2: Comparison between VRSGT and DRSGD in solving the decentralized PCA problem.
  • Figure 3: Recovery results of four frames from the KITTI dataset with inliers in blue and outliers in red. Both inliers and outliers are detected by using a threshold on the distance to the hyperplane recovered by each tested algorithm. The results are represented by projecting 3D point clouds onto the image.

Theorems & Definitions (17)

  • Definition 1.1: Wang2021multipliers
  • Lemma 2.1
  • Definition 2.2
  • Theorem 3.1
  • proof : Proof of Lemma \ref{['le:expen']}
  • Lemma B.1
  • proof
  • Lemma B.2
  • proof
  • Lemma B.3
  • ...and 7 more