Optimization without Retraction on the Random Generalized Stiefel Manifold

Simon Vary; Pierre Ablin; Bin Gao; P. -A. Absil

Optimization without Retraction on the Random Generalized Stiefel Manifold

Simon Vary, Pierre Ablin, Bin Gao, P. -A. Absil

TL;DR

The paper tackles optimization over a stochastic generalized Stiefel constraint ${\mathrm{St}_{B}(p, n)}$ where $B=\mathbb{E}[B_\zeta]$ is unknown. It introduces a cheap stochastic landing update that converges to $e$-critical points in expectation without explicit retractions, matching the convergence rates of Riemannian methods while relying only on matrix multiplications and stochastic estimates of the constraint. The authors establish deterministic and stochastic convergence guarantees and show that the per-iteration cost scales favorably (e.g., $\mathcal{O}(npr)$ with batch size $r$) and memory usage scales as $\mathcal{O}(n(p+r))$, thanks to avoiding full formation of $B$. Empirical results on generalized eigenvalue problems, stochastic CCA, and ICA demonstrate fast convergence, robustness to parameter choices, and memory efficiency, validating the method’s practical impact for problems with generalized orthogonality constraints.

Abstract

Optimization over the set of matrices $X$ that satisfy $X^\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods that require a fully formed $B$. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to random estimates of $B$. Our method does not enforce the constraint in every iteration; instead, it produces iterations that converge to critical points on the generalized Stiefel manifold defined in expectation. The method has lower per-iteration cost, requires only matrix multiplications, and has the same convergence rates as its Riemannian optimization counterparts that require the full matrix $B$. Experiments demonstrate its effectiveness in various machine learning applications involving generalized orthogonality constraints, including CCA, ICA, and the GEVP.

Optimization without Retraction on the Random Generalized Stiefel Manifold

TL;DR

The paper tackles optimization over a stochastic generalized Stiefel constraint

where

is unknown. It introduces a cheap stochastic landing update that converges to

-critical points in expectation without explicit retractions, matching the convergence rates of Riemannian methods while relying only on matrix multiplications and stochastic estimates of the constraint. The authors establish deterministic and stochastic convergence guarantees and show that the per-iteration cost scales favorably (e.g.,

with batch size

) and memory usage scales as

, thanks to avoiding full formation of

. Empirical results on generalized eigenvalue problems, stochastic CCA, and ICA demonstrate fast convergence, robustness to parameter choices, and memory efficiency, validating the method’s practical impact for problems with generalized orthogonality constraints.

Abstract

Optimization over the set of matrices

that satisfy

, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods that require a fully formed

. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to random estimates of

. Our method does not enforce the constraint in every iteration; instead, it produces iterations that converge to critical points on the generalized Stiefel manifold defined in expectation. The method has lower per-iteration cost, requires only matrix multiplications, and has the same convergence rates as its Riemannian optimization counterparts that require the full matrix

. Experiments demonstrate its effectiveness in various machine learning applications involving generalized orthogonality constraints, including CCA, ICA, and the GEVP.

Paper Structure (35 sections, 11 theorems, 83 equations, 9 figures, 2 tables)

This paper contains 35 sections, 11 theorems, 83 equations, 9 figures, 2 tables.

Introduction
Prior work related to optimization on the generalized Stiefel manifold
Riemannian optimization.
Infeasible optimization methods.
Existing methods for the GEVP and CCA
Deterministic methods.
Stochastic methods.
Comparison with the landing.
Landing on General Stochastic Constraints
Deterministic case
Stochastic case
Landing on the Generalized Stiefel Manifold
Deterministic generalized Stiefel case
Stochastic generalized Stiefel case
Numerical Experiments
...and 20 more sections

Key Result

Proposition 2.3

The Riemannian gradient defined in Definition def:riemannian_gradient is a relative ascent direction on $\mathcal{M}^\varepsilon$ with $\rho = 1$.

Figures (9)

Figure 1: Illustration of the landing field and the random feasible set.
Figure 2: Generalized eigenvalue problem ($n = 1000, p = 500$).
Figure 4: Stochastic ICA on the synthetic dataset for $n=10$.
Figure 5: Deterministic computation of the generalized eigenvalue problem with $n = 1000, p = 500$, the condition number of the two matrices $\kappa_B = \kappa_A =100$. Each algorithm is given a time limit of $120$ seconds.
Figure 6: Stochastic canonical correlation analysis on the split MNIST dataset for $p=10$ canonical components.
...and 4 more figures

Theorems & Definitions (24)

Definition 2.1: Relative ascent direction
Definition 2.2: Riemannian gradient on the layered manifold $\mathcal{M}_c$
Proposition 2.3: Riemannian gradient is a relative ascent direction
Proposition 2.4: Lipschitz constant of Fletcher's augmented Lagrangian
proof
Lemma 2.5: A step-size safeguard
Lemma 2.6: A lower-bound on the step-size safeguard
Lemma 2.7
Theorem 2.8: Convergence of the deterministic landing
Theorem 2.9: Convergence of the stochastic landing
...and 14 more

Optimization without Retraction on the Random Generalized Stiefel Manifold

TL;DR

Abstract

Optimization without Retraction on the Random Generalized Stiefel Manifold

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (24)