Table of Contents
Fetching ...

Scalable Second-Order Optimization Algorithms for Minimizing Low-rank Functions

Edward Tansley, Coralia Cartis

TL;DR

This work addresses the challenge of applying second-order optimization to high-dimensional problems by exploiting low-rank structure via a random-subspace cubic regularization approach. It introduces R-ARC-D, an adaptive sketch-size variant of R-ARC that adjusts the subspace dimension based on observed Hessian rank, achieving the optimal $O(ε^{-3/2})$ convergence while keeping the sketch size $l_k$ on the order of the true rank $r$. Theoretical guarantees show that the adaptive rule preserves convergence rates under Gaussian embeddings, and numerical experiments on augmented low-rank CUTEst problems demonstrate substantial efficiency gains and rank-learning capabilities. The findings enhance the practicality of scalable second-order methods for high-dimensional, rank-constrained objectives with broad applicability in machine learning and hyperparameter optimization.

Abstract

We present a random-subspace variant of cubic regularization algorithm that chooses the size of the subspace adaptively, based on the rank of the projected second derivative matrix. Iteratively, our variant only requires access to (small-dimensional) projections of first- and second-order problem derivatives and calculates a reduced step inexpensively. The ensuing method maintains the optimal global rate of convergence of (full-dimensional) cubic regularization, while showing improved scalability both theoretically and numerically, particularly when applied to low-rank functions. When applied to the latter, our algorithm naturally adapts the subspace size to the true rank of the function, without knowing it a priori.

Scalable Second-Order Optimization Algorithms for Minimizing Low-rank Functions

TL;DR

This work addresses the challenge of applying second-order optimization to high-dimensional problems by exploiting low-rank structure via a random-subspace cubic regularization approach. It introduces R-ARC-D, an adaptive sketch-size variant of R-ARC that adjusts the subspace dimension based on observed Hessian rank, achieving the optimal convergence while keeping the sketch size on the order of the true rank . Theoretical guarantees show that the adaptive rule preserves convergence rates under Gaussian embeddings, and numerical experiments on augmented low-rank CUTEst problems demonstrate substantial efficiency gains and rank-learning capabilities. The findings enhance the practicality of scalable second-order methods for high-dimensional, rank-constrained objectives with broad applicability in machine learning and hyperparameter optimization.

Abstract

We present a random-subspace variant of cubic regularization algorithm that chooses the size of the subspace adaptively, based on the rank of the projected second derivative matrix. Iteratively, our variant only requires access to (small-dimensional) projections of first- and second-order problem derivatives and calculates a reduced step inexpensively. The ensuing method maintains the optimal global rate of convergence of (full-dimensional) cubic regularization, while showing improved scalability both theoretically and numerically, particularly when applied to low-rank functions. When applied to the latter, our algorithm naturally adapts the subspace size to the true rank of the function, without knowing it a priori.
Paper Structure (17 sections, 7 theorems, 15 equations, 4 figures, 1 table)

This paper contains 17 sections, 7 theorems, 15 equations, 4 figures, 1 table.

Key Result

Theorem 1

Suppose that $\mathcal{S}$ is the distribution of (scaled) $l \times d$ Gaussian matrices with $l = \mathcal{O}(r+1)$, where $r \leq d$ is an upper bound on the maximum rank of $\,\nabla^{2}f(x_k)$ across all iterations, and that $f$ has globally Lipschitz continuous second derivatives. Then R-ARC a

Figures (4)

  • Figure 1: Example of R-ARC-D applied to the low-rank problem l-ARTIF
  • Figure 2: Data profiles of R-ARC-D compared to R-ARC and ARC
  • Figure 3: Comparison between R-ARC-D and R-ARC on low-rank problems from Table \ref{['tab:cutest_lowrank']}
  • Figure 4: Example of R-ARC-D applied to the full-rank problem ARTIF (with parameter N = 1000), which has $r = d = 1000$.

Theorems & Definitions (10)

  • Theorem 1: Informal, Zhen-PhDshaoRandomsubspaceAdaptiveCubic2022
  • Definition 2: Low-rank Functions wang_bayesian_2016
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Theorem 6: R-ARC-D convergence result
  • Definition 7: $\epsilon$-subspace embedding 10.1561/0400000060
  • Definition 8: Oblivious subspace embedding 10.1561/040000006010.1109/FOCS.2006.37
  • Lemma 9: Theorem 2.3 in 10.1561/0400000060
  • Lemma 10: cartis_learning_2024cosson_gradient_2022