Random Subspace Cubic-Regularization Methods, with Applications to Low-Rank Functions
Coralia Cartis, Zhen Shao, Edward Tansley
TL;DR
This work develops Random Subspace ARC (R-ARC), a second-order optimization framework that operates in randomly sampled low-dimensional subspaces to scale cubic-regularization methods to high-dimensional problems. By projecting gradients and Hessians via a sketching matrix and solving a reduced cubic model, R-ARC preserves the optimal first-order convergence rate $O(\epsilon^{-3/2})$ with high probability, and extends the analysis to second-order criticality in both subspace and full-space settings. The paper further introduces R-ARC-D, an adaptive variant that automatically adjusts the sketch size to the Hessian’s effective rank, enabling substantial dimensionality reduction for low-rank functions and retaining the same convergence guarantees. Numerical experiments on full-rank and low-rank CUTEst problems demonstrate favorable performance of R-ARC and R-ARC-D relative to full-space ARC, particularly when the problem exhibits low effective dimensionality or when computational budgets favor reduced inner problem costs. Collectively, these results support the practical viability of random subspace cubic-regularization as a scalable, theoretically sound approach for high-dimensional, second-order optimization problems.
Abstract
We propose and analyze random subspace variants of the second-order Adaptive Regularization using Cubics (ARC) algorithm. These methods iteratively restrict the search space to some random subspace of the parameters, constructing and minimizing a local model only within this subspace. Thus, our variants only require access to (small-dimensional) projections of first- and second-order problem derivatives and calculate a reduced step inexpensively. Under suitable assumptions, the ensuing methods maintain the optimal first-order, and second-order, global rates of convergence of (full-dimensional) cubic regularization, while showing improved scalability both theoretically and numerically, particularly when applied to low-rank functions. When applied to the latter, our adaptive variant naturally adapts the subspace size to the true rank of the function, without knowing it a priori.
