Table of Contents
Fetching ...

Gauss-Southwell type descent methods for low-rank matrix optimization

Guillaume Olikier, André Uschmajew, Bart Vandereycken

TL;DR

The paper addresses low-rank matrix optimization with a rank constraint by exploiting a factorized representation and two Gauss--Southwell-like descent schemes. It develops a balanced factorized method and an embedded Riemannian method based on tangent-space projections, proving global gradient-type convergence for both and a global $O(1/\sqrt{\ell})$ rate for the Riemannian variant. A local linear convergence result is established when a point has a positive-definite Riemannian Hessian, and the analysis yields novel convergence insights for alternating least squares. Numerical experiments show that the Riemannian-subspace approach is more robust to small singular values and ill-conditioning, with faster convergence and better stability than the balanced factorization, particularly under challenging conditioning and line-search settings.

Abstract

We consider gradient-related methods for low-rank matrix optimization with a smooth cost function. The methods operate on single factors of the low-rank factorization and share aspects of both alternating and Riemannian optimization. Two possible choices for the search directions based on Gauss-Southwell type selection rules are compared: one using the gradient of a factorized non-convex formulation, the other using the Riemannian gradient. While both methods provide gradient convergence guarantees that are similar to the unconstrained case, numerical experiments on a quadratic cost function indicate that the version based on the Riemannian gradient is significantly more robust with respect to small singular values and the condition number of the cost function. As a side result of our approach, we also obtain new convergence results for the alternating least squares method.

Gauss-Southwell type descent methods for low-rank matrix optimization

TL;DR

The paper addresses low-rank matrix optimization with a rank constraint by exploiting a factorized representation and two Gauss--Southwell-like descent schemes. It develops a balanced factorized method and an embedded Riemannian method based on tangent-space projections, proving global gradient-type convergence for both and a global rate for the Riemannian variant. A local linear convergence result is established when a point has a positive-definite Riemannian Hessian, and the analysis yields novel convergence insights for alternating least squares. Numerical experiments show that the Riemannian-subspace approach is more robust to small singular values and ill-conditioning, with faster convergence and better stability than the balanced factorization, particularly under challenging conditioning and line-search settings.

Abstract

We consider gradient-related methods for low-rank matrix optimization with a smooth cost function. The methods operate on single factors of the low-rank factorization and share aspects of both alternating and Riemannian optimization. Two possible choices for the search directions based on Gauss-Southwell type selection rules are compared: one using the gradient of a factorized non-convex formulation, the other using the Riemannian gradient. While both methods provide gradient convergence guarantees that are similar to the unconstrained case, numerical experiments on a quadratic cost function indicate that the version based on the Riemannian gradient is significantly more robust with respect to small singular values and the condition number of the cost function. As a side result of our approach, we also obtain new convergence results for the alternating least squares method.
Paper Structure (12 sections, 9 theorems, 105 equations, 3 figures, 1 table, 3 algorithms)

This paper contains 12 sections, 9 theorems, 105 equations, 3 figures, 1 table, 3 algorithms.

Key Result

Theorem 3.1

Assume that $f \colon {\mathbb{R}}^{m \times n} \to {\mathbb{R}}$ is $\lambda$-smooth and that the constrained sublevel set $N_0 \coloneqq \{X \in \mathcal{M}_{\le k} : f(X) \le f(L_0^{} R_0^\top) \}$ is contained in a ball $\{X \in {\mathbb{R}}^{m \times n} \colon \| X \|_2 \le \rho \}$. Depending where:

Figures (3)

  • Figure 1: Strongly convex quadratic function with condition number $\kappa_{\mathcal{A}} = 20$. The effective condition number of the minimizer $X_*$ is $\kappa_{X_*} = 5$. The shaded areas correspond to the 50 and 90 percentiles of the relevant quantity in each panel.
  • Figure 2: Same setting as Fig. \ref{['fig:influence balancing']} but now with several values of $\kappa_{X_*} \in \{ 1.1, 10, 100 \}$. The shaded areas correspond from left to right to increasing values of $\kappa_{X_*}$.
  • Figure 3: Influence of the line search for a strongly convex quadratic function with fixed condition numbers $\kappa_{\mathcal{A}} = 40$ and $\kappa_{X_*} = 5$. The colored shaded areas correspond to the 90 percentile data over 20 random realizations. The grey lines indicate the median values.

Theorems & Definitions (17)

  • Theorem 3.1
  • Corollary 3.2
  • Remark 3.3
  • proof : Proof of Theorem \ref{['thm: balanced']}
  • proof : Proof of Corollary \ref{['cor: balanced']}
  • Definition
  • Theorem 3.4
  • Corollary 3.5
  • Lemma 3.6
  • proof
  • ...and 7 more