Scalable Second-Order Optimization Algorithms for Minimizing Low-rank Functions

Edward Tansley; Coralia Cartis

Scalable Second-Order Optimization Algorithms for Minimizing Low-rank Functions

Edward Tansley, Coralia Cartis

TL;DR

This work addresses the challenge of applying second-order optimization to high-dimensional problems by exploiting low-rank structure via a random-subspace cubic regularization approach. It introduces R-ARC-D, an adaptive sketch-size variant of R-ARC that adjusts the subspace dimension based on observed Hessian rank, achieving the optimal $O(ε^{-3/2})$ convergence while keeping the sketch size $l_k$ on the order of the true rank $r$. Theoretical guarantees show that the adaptive rule preserves convergence rates under Gaussian embeddings, and numerical experiments on augmented low-rank CUTEst problems demonstrate substantial efficiency gains and rank-learning capabilities. The findings enhance the practicality of scalable second-order methods for high-dimensional, rank-constrained objectives with broad applicability in machine learning and hyperparameter optimization.

Abstract

We present a random-subspace variant of cubic regularization algorithm that chooses the size of the subspace adaptively, based on the rank of the projected second derivative matrix. Iteratively, our variant only requires access to (small-dimensional) projections of first- and second-order problem derivatives and calculates a reduced step inexpensively. The ensuing method maintains the optimal global rate of convergence of (full-dimensional) cubic regularization, while showing improved scalability both theoretically and numerically, particularly when applied to low-rank functions. When applied to the latter, our algorithm naturally adapts the subspace size to the true rank of the function, without knowing it a priori.

Scalable Second-Order Optimization Algorithms for Minimizing Low-rank Functions

TL;DR

convergence while keeping the sketch size

on the order of the true rank

. Theoretical guarantees show that the adaptive rule preserves convergence rates under Gaussian embeddings, and numerical experiments on augmented low-rank CUTEst problems demonstrate substantial efficiency gains and rank-learning capabilities. The findings enhance the practicality of scalable second-order methods for high-dimensional, rank-constrained objectives with broad applicability in machine learning and hyperparameter optimization.

Abstract

Paper Structure (17 sections, 7 theorems, 15 equations, 4 figures, 1 table)

This paper contains 17 sections, 7 theorems, 15 equations, 4 figures, 1 table.

Introduction
ARC and R-ARC
Low-rank functions
Contributions
Algorithm and Main Results
Evaluating sketched problem information
An adaptive sketch size rule
Numerical Experiments
Augmented CUTEst problems
R-ARC-D update step
Data Profiles
Acknowledgments
Additional results
Proof of Lemma \ref{['lem:low:rank:hessian']}
Data profile methodology
...and 2 more sections

Key Result

Theorem 1

Suppose that $\mathcal{S}$ is the distribution of (scaled) $l \times d$ Gaussian matrices with $l = \mathcal{O}(r+1)$, where $r \leq d$ is an upper bound on the maximum rank of $\,\nabla^{2}f(x_k)$ across all iterations, and that $f$ has globally Lipschitz continuous second derivatives. Then R-ARC a

Figures (4)

Figure 1: Example of R-ARC-D applied to the low-rank problem l-ARTIF
Figure 2: Data profiles of R-ARC-D compared to R-ARC and ARC
Figure 3: Comparison between R-ARC-D and R-ARC on low-rank problems from Table \ref{['tab:cutest_lowrank']}
Figure 4: Example of R-ARC-D applied to the full-rank problem ARTIF (with parameter N = 1000), which has $r = d = 1000$.

Theorems & Definitions (10)

Theorem 1: Informal, Zhen-PhDshaoRandomsubspaceAdaptiveCubic2022
Definition 2: Low-rank Functions wang_bayesian_2016
Lemma 3
Lemma 4
Lemma 5
Theorem 6: R-ARC-D convergence result
Definition 7: $\epsilon$-subspace embedding 10.1561/0400000060
Definition 8: Oblivious subspace embedding 10.1561/040000006010.1109/FOCS.2006.37
Lemma 9: Theorem 2.3 in 10.1561/0400000060
Lemma 10: cartis_learning_2024cosson_gradient_2022

Scalable Second-Order Optimization Algorithms for Minimizing Low-rank Functions

TL;DR

Abstract

Scalable Second-Order Optimization Algorithms for Minimizing Low-rank Functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (10)