Table of Contents
Fetching ...

Faster Randomized Methods for Orthogonality Constrained Problems

Boris Shustin, Haim Avron

TL;DR

The paper addresses optimization problems with generalized orthogonality constraints by integrating randomized preconditioning into the Riemannian optimization framework. The authors design constant SPD metrics ${\bf M}$ constructed from data sketches to approximate Gram matrices, yielding well-conditioned Riemannian Hessians and faster convergence for problems like canonical correlation analysis (CCA) and Fisher linear discriminant analysis (FDA). They provide theoretical guarantees bounding the Hessian condition number via sketch quality and demonstrate substantial empirical speedups on real and synthetic datasets, including warm-start benefits. The approach reduces preprocessing costs to near-linear in data size while preserving convergence behavior across zero, first-, and second-order Riemannian methods. Overall, the work offers a practical, theory-backed pathway to scalable optimization under orthogonality constraints with broad applicability in data analysis tasks.

Abstract

Recent literature has advocated the use of randomized methods for accelerating the solution of various matrix problems arising throughout data science and computational science. One popular strategy for leveraging randomization is to use it as a way to reduce problem size. However, methods based on this strategy lack sufficient accuracy for some applications. Randomized preconditioning is another approach for leveraging randomization, which provides higher accuracy. The main challenge in using randomized preconditioning is the need for an underlying iterative method, thus randomized preconditioning so far have been applied almost exclusively to solving regression problems and linear systems. In this article, we show how to expand the application of randomized preconditioning to another important set of problems prevalent across data science: optimization problems with (generalized) orthogonality constraints. We demonstrate our approach, which is based on the framework of Riemannian optimization and Riemannian preconditioning, on the problem of computing the dominant canonical correlations and on the Fisher linear discriminant analysis problem. For both problems, we evaluate the effect of preconditioning on the computational costs and asymptotic convergence, and demonstrate empirically the utility of our approach.

Faster Randomized Methods for Orthogonality Constrained Problems

TL;DR

The paper addresses optimization problems with generalized orthogonality constraints by integrating randomized preconditioning into the Riemannian optimization framework. The authors design constant SPD metrics constructed from data sketches to approximate Gram matrices, yielding well-conditioned Riemannian Hessians and faster convergence for problems like canonical correlation analysis (CCA) and Fisher linear discriminant analysis (FDA). They provide theoretical guarantees bounding the Hessian condition number via sketch quality and demonstrate substantial empirical speedups on real and synthetic datasets, including warm-start benefits. The approach reduces preprocessing costs to near-linear in data size while preserving convergence behavior across zero, first-, and second-order Riemannian methods. Overall, the work offers a practical, theory-backed pathway to scalable optimization under orthogonality constraints with broad applicability in data analysis tasks.

Abstract

Recent literature has advocated the use of randomized methods for accelerating the solution of various matrix problems arising throughout data science and computational science. One popular strategy for leveraging randomization is to use it as a way to reduce problem size. However, methods based on this strategy lack sufficient accuracy for some applications. Randomized preconditioning is another approach for leveraging randomization, which provides higher accuracy. The main challenge in using randomized preconditioning is the need for an underlying iterative method, thus randomized preconditioning so far have been applied almost exclusively to solving regression problems and linear systems. In this article, we show how to expand the application of randomized preconditioning to another important set of problems prevalent across data science: optimization problems with (generalized) orthogonality constraints. We demonstrate our approach, which is based on the framework of Riemannian optimization and Riemannian preconditioning, on the problem of computing the dominant canonical correlations and on the Fisher linear discriminant analysis problem. For both problems, we evaluate the effect of preconditioning on the computational costs and asymptotic convergence, and demonstrate empirically the utility of our approach.

Paper Structure

This paper contains 36 sections, 14 theorems, 147 equations, 6 figures, 2 tables, 2 algorithms.

Key Result

lemma 2

Assume that $\lambda>0$ or that ${\bm{\mathrm{Z}}}\in\mathbb{R}^{n\times d}$ has full column rank. Let $s_{\lambda}({\bm{\mathrm{Z}}})\coloneqq{\bf Tr}\left(({\bm{\mathrm{Z}}}^{\textsc{T}}{\bm{\mathrm{Z}}}+\lambda{\bm{\mathrm{I}}})^{-1}{\bm{\mathrm{Z}}}^{\textsc{T}}{\bm{\mathrm{Z}}}\right)$. Suppose

Figures (6)

  • Figure 6.1: Results for CCA on MNIST. For CountSketch the number of rows is $s$, and $k$ is the number of singular vectors for the dominant subspace preconditioner.
  • Figure 6.2: Results for FDA on MNIST. For CountSketch the number of rows is $s$, and $k$ is the number of singular vectors for the dominant subspace preconditioner.
  • Figure 6.3: Results for CCA on MEDIAMILL. For CountSketch the number of rows is $s$, and $k$ is the number of singular vectors for the dominant subspace preconditioner.
  • Figure 6.4: Results for FDA on COVTYPE. For CountSketch the number of rows is $s$, and $k$ is the number of singular vectors for the dominant subspace preconditioner.
  • Figure 6.5: Results for CCA on a synthetic experiment. Left - suboptimality vs. time in seconds. Right - suboptimality vs. iteration count. we use CountSketch as the sketching transform, where $s$ is the number of rows after sketching is applied to the data matrices.
  • ...and 1 more figures

Theorems & Definitions (34)

  • remark 1
  • lemma 2
  • proof
  • remark 3
  • definition 4
  • Theorem 5
  • Theorem 6
  • remark 7
  • Theorem 8
  • Theorem 9
  • ...and 24 more