Gauss-Southwell type descent methods for low-rank matrix optimization

Guillaume Olikier; André Uschmajew; Bart Vandereycken

Gauss-Southwell type descent methods for low-rank matrix optimization

Guillaume Olikier, André Uschmajew, Bart Vandereycken

TL;DR

The paper addresses low-rank matrix optimization with a rank constraint by exploiting a factorized representation and two Gauss--Southwell-like descent schemes. It develops a balanced factorized method and an embedded Riemannian method based on tangent-space projections, proving global gradient-type convergence for both and a global $O(1/\sqrt{\ell})$ rate for the Riemannian variant. A local linear convergence result is established when a point has a positive-definite Riemannian Hessian, and the analysis yields novel convergence insights for alternating least squares. Numerical experiments show that the Riemannian-subspace approach is more robust to small singular values and ill-conditioning, with faster convergence and better stability than the balanced factorization, particularly under challenging conditioning and line-search settings.

Abstract

We consider gradient-related methods for low-rank matrix optimization with a smooth cost function. The methods operate on single factors of the low-rank factorization and share aspects of both alternating and Riemannian optimization. Two possible choices for the search directions based on Gauss-Southwell type selection rules are compared: one using the gradient of a factorized non-convex formulation, the other using the Riemannian gradient. While both methods provide gradient convergence guarantees that are similar to the unconstrained case, numerical experiments on a quadratic cost function indicate that the version based on the Riemannian gradient is significantly more robust with respect to small singular values and the condition number of the cost function. As a side result of our approach, we also obtain new convergence results for the alternating least squares method.

Gauss-Southwell type descent methods for low-rank matrix optimization

TL;DR

rate for the Riemannian variant. A local linear convergence result is established when a point has a positive-definite Riemannian Hessian, and the analysis yields novel convergence insights for alternating least squares. Numerical experiments show that the Riemannian-subspace approach is more robust to small singular values and ill-conditioning, with faster convergence and better stability than the balanced factorization, particularly under challenging conditioning and line-search settings.

Abstract

Paper Structure (12 sections, 9 theorems, 105 equations, 3 figures, 1 table, 3 algorithms)

This paper contains 12 sections, 9 theorems, 105 equations, 3 figures, 1 table, 3 algorithms.

Introduction
Recap of gradient descent methods
Block gradient descent for low-rank matrices
Balanced factorized version
Embedded Riemannian version
Comparison of convergence guarantees
Linear convergence rate
Alternating least squares
Numerical experiments
Influence of balancing and orthogonalization.
Influence of condition numbers $\kappa_{X_*}$ and $\kappa_{\mathcal{A}}$.
Influence of line search.

Key Result

Theorem 3.1

Assume that $f \colon {\mathbb{R}}^{m \times n} \to {\mathbb{R}}$ is $\lambda$-smooth and that the constrained sublevel set $N_0 \coloneqq \{X \in \mathcal{M}_{\le k} : f(X) \le f(L_0^{} R_0^\top) \}$ is contained in a ball $\{X \in {\mathbb{R}}^{m \times n} \colon \| X \|_2 \le \rho \}$. Depending where:

Figures (3)

Figure 1: Strongly convex quadratic function with condition number $\kappa_{\mathcal{A}} = 20$. The effective condition number of the minimizer $X_*$ is $\kappa_{X_*} = 5$. The shaded areas correspond to the 50 and 90 percentiles of the relevant quantity in each panel.
Figure 2: Same setting as Fig. \ref{['fig:influence balancing']} but now with several values of $\kappa_{X_*} \in \{ 1.1, 10, 100 \}$. The shaded areas correspond from left to right to increasing values of $\kappa_{X_*}$.
Figure 3: Influence of the line search for a strongly convex quadratic function with fixed condition numbers $\kappa_{\mathcal{A}} = 40$ and $\kappa_{X_*} = 5$. The colored shaded areas correspond to the 90 percentile data over 20 random realizations. The grey lines indicate the median values.

Theorems & Definitions (17)

Theorem 3.1
Corollary 3.2
Remark 3.3
proof : Proof of Theorem \ref{['thm: balanced']}
proof : Proof of Corollary \ref{['cor: balanced']}
Definition
Theorem 3.4
Corollary 3.5
Lemma 3.6
proof
...and 7 more

Gauss-Southwell type descent methods for low-rank matrix optimization

TL;DR

Abstract

Gauss-Southwell type descent methods for low-rank matrix optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (17)